eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
361 stars 165 forks source link

[FedX] Configure timeouts using the configuration file #4753

Open ludovicm67 opened 1 year ago

ludovicm67 commented 1 year ago

Problem description

I'm currently using the following to create the FedX repository from a turtle configuration file:

FedXFactory.createFederation(configFile)

My configuration file look like this:

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix fedx: <http://rdf4j.org/config/federation#> .

<http://example.com/endpoint1> a sd:Service ;
    fedx:store "SPARQLEndpoint";
    sd:endpoint "http://endpoint1:8080/sparql".

<http://example.com/endpoint2> a sd:Service ;
    fedx:store "SPARQLEndpoint";
    sd:endpoint "http://endpoint2:8080/sparql".

<http://example.com/endpoint3> a sd:Service ;
    fedx:store "SPARQLEndpoint" ;
    sd:endpoint "http://endpoint3:8080/sparql".

<http://example.com/endpoint4> a sd:Service ;
    fedx:store "SPARQLEndpoint";
    sd:endpoint "http://endpoint4:8080/sparql".

One of my endpoint is a Ontop endpoint in front of a huge database, and queries to that endpoint are very slow.

Currently, it's so slow, that even simple queries are hitting a timeout (in the logs: java.net.SocketTimeoutException: Read timed out, and a 500 is returned).

I want to know if it is possible to configure timeouts in general and by endpoint.

Preferred solution

Be able to configure timeouts for the global FedX instance and for each configured endpoint using the configuration file, and have those options documented.

Are you interested in contributing a solution yourself?

No

Alternatives you've considered

No response

Anything else?

In my situation, I know that one endpoint is very slow, so I ideally I just want to increase the timeout for that endpoint. If a query takes too long on other endpoints, then throwing the timeout makes sense.

aschwarte10 commented 1 year ago

Please see my response and explanation in https://github.com/eclipse-rdf4j/rdf4j/issues/4752#issuecomment-1692824261 - it is roughly the same.

In my situation, I know that one endpoint is very slow, so I ideally I just want to increase the timeout for that endpoint. If a query takes too long on other endpoints, then throwing the timeout makes sense.

The timeout in FedX is a global timeout on the query level, i.e. a max exution time for the entire query. There does not exist a federation member specific query timeout. Actually this would contradict the design of a transparent federation which aims a producing consistent results as if all data was in a single virtual graph.

This means: I do not see that we want to add such federation member specific timeout in FedX in RDF4J.

One idea for your use-case: you could implement a RepositoryWrapper around your repository representing the specific member, which internally takes care for timeouts.

ludovicm67 commented 1 year ago

Thank you @aschwarte10 for your answers!

As my need is to have something quite generic, I don't think I will be able to configure a RepositoryWrapper directly from a configuration file.

I would be happy enough for now, if I can configure a value for enforceMaxQueryTime from that configuration file.

aschwarte10 commented 1 year ago

Thanks for your feedback.

As stated in https://github.com/eclipse-rdf4j/rdf4j/issues/4752#issuecomment-1692824261 I do not see that the configuration options get mixed with the definition of the federation members (where we have a clear schema to represent this in RDF).

Configuration options are rather key value pairs, where a regular properties file would be more appropriate.

I personally do not really the need for introducing such configuration file, as in the cases where you manually configure your application code and construct the federation, you have full control anyways. And there you would typically see application specific logic for configuration (e.g. in the simplest case by externalizing options via system properties).

On the other hand - as mentioned in my comment - I would see the need for declarative configuration from the repository configuration file - as the application typically here is an RDF4J workbench, i.e. there is no direct control over the repository creation

In case you or others see the need for a dedicated configuration file: would you be interested or able to make a contribution to the project?

ludovicm67 commented 1 year ago

Thank you for your answer.

In my case, there would be no direct control about the repository creation, so in the current state, it's not possible to adjust values ; we need to deal with some hard-coded values, that will not be the best for all situations.

would you be interested or able to make a contribution to the project?

I don't think I will be able to find enough time to work on this. If someone is interested to work on it, this can be great! :)