Swirrl / ook

Structural search engine
https://search-prototype.gss-data.org.uk/
Eclipse Public License 1.0
6 stars 0 forks source link

Configure drafter client to pass max-query-timeout #103

Open Robsteranium opened 3 years ago

Robsteranium commented 3 years ago

Stardog times out before queries can finish and this can go unnoticed (if drafter swallows the error).

Drafter asks Stardog to time-out queries after 30s (although result may continue to be returned for 90s).

Stardog is configured to allow a maximum time-out of 15 mins.

Drafter will allow privileged users to provide a JWS-signed timeout parameter to their requests to lift the 30s timeout.

We need to generate a key, configure drafter's DRAFTER_JWS_SIGNING_KEY env var, and have the ook.etl/query function pass the signed parameter.

This will leave large results in stasher but they're in binary format (so orders of magnitude smaller than the text versions (which are gigabytes)) and they'll only survive until the next publish. Once we're loading changes only #17 this should be less of a concern.

An alternative would be to make requests directly to Stardog. For this we'd need to install ook on the muttnik box or otherwise tunnel to stardog. We'd also need to change the queries to exclude draft graphs.

Robsteranium commented 3 years ago

Adding jws-signing led to HTTP/2 errors with curl. The max-query-timeout parameter was also missing from drafter's swagger schema which mean the Martian client didn't want to use it.

Paging the large observation-select.sparql by graphs ought to obviate the need for this, but it may recur (perhaps for the expensive codes-used-construct.sparql queries #69.

Robsteranium commented 1 year ago

We're seeing truncated responses on the code-pipeline now too. It passes with a select limit of 100 but this isn't ideal.

@RicSwirrl has also wondered if the load balancer is truncating responses.