Server does not catch timeout error from SPARQL Endpoint

larsgsvensson commented 7 years ago

When a Server.js instance is configured to use a SPARQL endpoint and that endpoint times out, the error is not caught correctly but is re-thrown which causes an ldf-client to terminate its execution. The Server.js instance writes the following error message:

events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: Error accessing SPARQL endpoint http://dbpedia.org/sparql: ESOCKETTIMEDOUT
    at emitError (/usr/local/lib/node_modules/ldf-server/lib/datasources/SparqlDatasource.js:71:40)
    at Request._callback (/usr/local/lib/node_modules/ldf-server/lib/datasources/SparqlDatasource.js:64:7)
    at self.callback (/usr/local/lib/node_modules/ldf-server/node_modules/request/request.js:186:22)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at ClientRequest.<anonymous> (/usr/local/lib/node_modules/ldf-server/node_modules/request/request.js:781:16)
    at ClientRequest.g (events.js:291:16)
    at emitNone (events.js:86:13)
    at ClientRequest.emit (events.js:185:7)
    at Socket.emitTimeout (_http_client.js:620:10)
Worker 12477 died with 1. Starting new worker.
Worker 12896 running on http://localhost:8081/.

The client exits with this message:

svensson@ldslab:~$ node --max_old_space_size=4096 -- /usr/local/bin/ldf-client http://10.69.14.96:8081/DBPedia-SPARQL http://10.69.14.96:8081/WikipediaCitationSources http://10.69.14.96:8081/WikipediaCitationsISBN http://10.69.14.96:8081/DNBTitel http://10.69.14.96:8081/GND -f aen-dbpedia.sparql > aen-dbpedia.json
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: socket hang up
    at createHangUpError (_http_client.js:254:15)
    at Socket.socketOnEnd (_http_client.js:346:23)
    at emitNone (events.js:91:20)
    at Socket.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:974:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

RubenVerborgh commented 7 years ago

Thanks for reporting. I think that #45 actually also fixes this. I'll double-check.

larsgsvensson commented 7 years ago

Great, thanks!

RubenVerborgh commented 7 years ago

When a Server.js instance is configured to use a SPARQL endpoint and that endpoint times out, the error is not caught correctly

This is partially addressed by #45: timeouts from COUNT queries will be ignored (and this was the issue you ran into). Timeouts for data queries, however should not be ignored, as receiving data is a prerequisite for sending the answer.

but is re-thrown which causes an ldf-client to terminate its execution.

This is expected behavior; if the client cannot receive data from the server because of a server error, the query evaluation process terminates. This is to avoid having an incomplete resultset without notice. This behavior is slightly different for federation: there, endpoints are allowed to fail (as long as one of them answers).

larsgsvensson commented 7 years ago

Timeouts for data queries, however should not be ignored, as receiving data is a prerequisite for sending the answer.

Since timeouts can occur at any time and they often are just short outages, my expectation would be that both the ldf-client and/or the TPF-server simply retry the query if the SPARQL endpoint or the TPF-server time out for whatever reason. It seems that the TPS server is fairly resilient to the cases when a SPARQL endpoint sends incorrect turtle. This error doesn't make the client terminat:

Error: Error accessing SPARQL endpoint http://dbpedia.org/sparql: The endpoint returned an invalid Turtle response.
    at emitError (/usr/local/lib/node_modules/ldf-server/lib/datasources/SparqlDatasource.js:71:40)
    at Object._callback (/usr/local/lib/node_modules/ldf-server/lib/datasources/SparqlDatasource.js:52:11)
    at /usr/local/lib/node_modules/ldf-server/node_modules/n3/lib/N3Parser.js:686:14
    at reportSyntaxError (/usr/local/lib/node_modules/ldf-server/node_modules/n3/lib/N3Lexer.js:270:40)
    at Object._tokenizeToEnd (/usr/local/lib/node_modules/ldf-server/node_modules/n3/lib/N3Lexer.js:109:20)
    at Request.<anonymous> (/usr/local/lib/node_modules/ldf-server/node_modules/n3/lib/N3Lexer.js:338:16)
    at emitOne (events.js:101:20)
    at Request.emit (events.js:188:7)
    at IncomingMessage.<anonymous> (/usr/local/lib/node_modules/ldf-server/node_modules/request/request.js:998:12)
    at emitOne (events.js:96:13)
    at IncomingMessage.emit (events.js:188:7)

but is re-thrown which causes an ldf-client to terminate its execution.

This is expected behavior; if the client cannot receive data from the server because of a server error, the query evaluation process terminates. This is to avoid having an incomplete resultset without notice. This behavior is slightly different for federation: there, endpoints are allowed to fail (as long as one of them answers).

But that behaviour for federate queries can -- or most likely will -- cause incomplete result-sets, too, so I don't quite see what's the difference.

RubenVerborgh commented 7 years ago

often are just short outages

Not with SPARQL endpoints, in my experience. I often see bursts.

my expectation would be that both the ldf-client and/or the TPF-server simply retry the query

From the server perspective: it seems counterproductive to retry a request that is timeouting. With a SPARQL endpoint, a timeout can signify an overload. So sending another query is likely to overload the server even more. Furthermore, a timeout occurs after 10 seconds; waiting much more to reply to a client seems very inconvenient (the TPF server itself would appear to be timeouting while retrying for another 10 seconds).

From the client perspective: the TPF server did not timeout, so no reason to retry.

So, the bottom-line is: as much as I would like to fix SPARQL query timeouts, I do not think that retrying is an appropriate strategy (but feel free to correct me).

It seems that the TPS server is fairly resilient to the cases when a SPARQL endpoint sends incorrect turtle.

True, we explicitly built this in, given that invalid Turtle often happened. However, unlike timeouts, this is easily fixed by sending a new request. (And we will probably move away from Turtle, see #47.)

But that behaviour for federate queries can -- or most likely will -- cause incomplete result-sets, too, so I don't quite see what's the difference.

You're right. The difference is purely based on what I see in research literature: when evaluating the performance of a single endpoint, full completeness is always assumed (or the query is considered as failed). When evaluating federated querying, both execution time and completeness are reported. This is why I've been more lenient for federated queries; however, I do agree that this difference is arbitrary from the TPF client point of view. We should probably have this as a configurable flag in the client.

jamesamcl commented 5 years ago

Why has #45 been merged into the feature-qpf-latest branch? It has nothing to do with quad pattern fragments.

rubensworks commented 5 years ago

feature-qpf-latest and feature-lsd are major reworks of the server regarding its configurability, so we apply significant changes to the server there. We intend to release these changes as a major release asap.

LinkedDataFragments / Server.js

Server does not catch timeout error from SPARQL Endpoint #46