VeritasOS / fedx

FedX is a framework for transparent access to Linked Data sources through a federation
8 stars 3 forks source link

Seeing lots of "Repository for endpoint ... not initialized" and "Error executing union operator: null" #5

Closed tfrancart closed 5 years ago

tfrancart commented 5 years ago

I am currently testing FedX. I try to query the following federation :

            List<Endpoint> endpoints = new ArrayList<Endpoint>();           
            endpoints.add( EndpointFactory.loadSPARQLEndpoint("arsol", "http://localhost:7200/repositories/openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr"));
            endpoints.add( EndpointFactory.loadSPARQLEndpoint("aerba", "http://localhost:7200/repositories/openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr"));
            endpoints.add( EndpointFactory.loadSPARQLEndpoint("referentiels", "http://localhost:7200/repositories/openarchaeo-referentiels"));

With Log4J debug activated, I see a lot of errors like this :

31930 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31930 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31930 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31931 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31931 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31931 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31867 [Join Scheduler-1] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized

And also a few like that :

47057 [Join Scheduler-8] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47099 [Join Scheduler-2] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47110 [Join Scheduler-4] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47118 [Join Scheduler-3] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47132 [Join Scheduler-6] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47212 [Join Scheduler-7] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47229 [Join Scheduler-9] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47244 [Join Scheduler-5] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null

And the query seems to hang and never returns.

Any idea what can cause these 2 errors ? Interestingly, if I remove one of the endpoints in the federation, the query performs fine. The triplestore is GraphDB.

How can I get more debug information on what is currently happing inside FedX ?

FWIW, here is the SPARQL query and its execution plan

SELECT DISTINCT  ?this ?thisLabel
WHERE
  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object> .
    ?this (<http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by>/<http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within>)|<http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in> ?Site1 .
    ?Site1  a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site> .
    ?Site1 <http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed>/<http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by> <https://halshs.archives-ouvertes.fr/search/index/q/*/contributorId_i/103825/> .
    OPTIONAL
      { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}
  }
7248 [main] DEBUG com.fluidops.fedx.FedXConnection  - Optimization start
7277 [main] DEBUG com.fluidops.fedx.optimizer.StatementGroupOptimizer  - Join arguments could be reduced to a single argument, replacing join node.
7277 [main] DEBUG com.fluidops.fedx.FedXConnection  - Optimization duration: 29
Optimized query execution plan: 
QueryRoot
   Distinct
      Projection
         ProjectionElemList
            ProjectionElem "this"
            ProjectionElem "thisLabel"
         LeftJoin
            NJoin
               ExclusiveGroup
                  ExclusiveStatement
                     Var (name=this)
                     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
                     Var (name=_const_e4a0c91_uri, value=http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object, anonymous)
                     StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
                  ExclusiveStatement
                     Var (name=_anon_d5e73e6a_4e89_4b72_bfa1_450d50e722b8, anonymous)
                     Var (name=_const_35bfec94_uri, value=http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by, anonymous)
                     Var (name=_const_775f2f14_uri, value=https://halshs.archives-ouvertes.fr/search/index/q/*/contributorId_i/103825/, anonymous)
                     StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
               StatementSourcePattern
                  Var (name=Site1)
                  Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
                  Var (name=_const_9d328667_uri, value=http://www.cidoc-crm.org/cidoc-crm/E27_Site, anonymous)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr, type=REMOTE)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
               StatementSourcePattern
                  Var (name=Site1)
                  Var (name=_const_9c6af60f_uri, value=http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed, anonymous)
                  Var (name=_anon_d5e73e6a_4e89_4b72_bfa1_450d50e722b8, anonymous)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr, type=REMOTE)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
               NUnion
                  ExclusiveGroup
                     ExclusiveStatement
                        Var (name=this)
                        Var (name=_const_440507c3_uri, value=http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by, anonymous)
                        Var (name=_anon_db9424a9_ce5b_4399_9647_687bf8269037, anonymous)
                        StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
                     ExclusiveStatement
                        Var (name=_anon_db9424a9_ce5b_4399_9647_687bf8269037, anonymous)
                        Var (name=_const_e772e6db_uri, value=http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within, anonymous)
                        Var (name=Site1)
                        StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
                  ExclusiveStatement
                     Var (name=this)
                     Var (name=_const_2dea2a6c_uri, value=http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in, anonymous)
                     Var (name=Site1)
                     StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
            ExclusiveStatement
               Var (name=this)
               Var (name=_const_c9f3cb8c_uri, value=http://www.w3.org/2004/02/skos/core#prefLabel, anonymous)
               Var (name=thisLabel)
               StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
aschwarte10 commented 5 years ago

Hmm, this looks strange. From the error message it appears that one of the Repository instances (i.e. the holding object for the connection to the federation member) is not initialized at the time of evaluation.

It also looks that it is always the same one of the arsol dataset. Is there something special about this particular endpoint?

Can you share a bit of code of how you initialize FedX and how you invoke your query? It looks like the initialization of the endpoint fails at some point, but is unclear when and why. It looks like the initial connection is working though, as the optimizer yields results.

tfrancart commented 5 years ago

Thanks. No, there is nothing special about this endpoint. the only thing special is that it contains a "default-graph-uri" parameter to restrict the queries to a single named graph within the repository. I use GraphDB free edition, which is I think restricted in terms of concurrent queries. I tried lowering the number of threads, but with no luck :

            Config.getConfig().set("joinWorkerThreads", "1");
            Config.getConfig().set("unionWorkerThreads", "1");

I suspect that one query is failing and causes subsequent queries to fail. However I don't see any error on the GraphDB log side. Is there a way to monitor the potential exceptions that may happen during query execution inside FedX ?

Here is the complete piece of code. Note it executes 5 test queries. Queries q1 q2 and q3 execute fine, the error happens on q4. Just before the serie of Exceptions, I see the 2 following lines in the debug :

3220 [FedX Executor-3] DEBUG com.fluidops.fedx.evaluation.join.ControlledWorkerBoundJoin  - Right argument is not an applicable BoundJoinTupleExpr. Fallback on ControlledWorkerJoin implementation: com.fluidops.fedx.algebra.NUnion
4241 [FedX Executor-1] DEBUG com.fluidops.fedx.evaluation.join.ControlledWorkerBoundJoin  - JoinStats: left iter of join #2 had 52504 results.
public class OpenArchaeoFedXTest {

    public void test() throws FedXException {
        BasicConfigurator.configure();
        Logger.getLogger("org.apache.http").setLevel(Level.INFO);
        Logger.getLogger("org.xml.sax").setLevel(Level.INFO);
        Logger.getLogger("org.eclipse.rdf4j.http.client").setLevel(Level.INFO);
        try {
            // Config.initialize("/home/thomas/sparna/00-Clients/MASA/20-Sources/fedx/openarchaeotest.properties");
            Config.initialize();
            Config.getConfig().set("debugQueryPlan", "true");
            Config.getConfig().set("enableMonitoring", "true");
            Config.getConfig().set("monitoring.logQueries", "true");
            Config.getConfig().set("debugWorkerScheduler", "true");
            Config.getConfig().set("joinWorkerThreads", "1");
            Config.getConfig().set("unionWorkerThreads", "1");

            List<Endpoint> endpoints = new ArrayList<Endpoint>();           
            endpoints.add( EndpointFactory.loadSPARQLEndpoint("arsol", "http://localhost:7200/repositories/openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr"));
            endpoints.add( EndpointFactory.loadSPARQLEndpoint("aerba", "http://localhost:7200/repositories/openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr"));
            // endpoints.add( EndpointFactory.loadSPARQLEndpoint("arsol", "http://localhost:7200/repositories/openarchaeo"));
            endpoints.add( EndpointFactory.loadSPARQLEndpoint("referentiels", "http://localhost:7200/repositories/openarchaeo-referentiels"));

            FedXFactory.initializeFederation(endpoints);

            // tous les 

            // tous les sites avec leur libellé
            String q1 = "SELECT DISTINCT  ?this ?thisLabel\n" + 
                    "WHERE\n" + 
                    "  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site> .\n" + 
                    "  OPTIONAL\n" + 
                    "  { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}\n" + 
                    "  }";

            // tous les mobiliers trouvés dans un site donné
            // works
            String q2 = "SELECT DISTINCT  ?this ?thisLabel\n" + 
                    "WHERE\n" + 
                    "  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object> .\n" + 
                    "    ?this <http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in>|(<http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by>/<http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within>) <http://arsol.univ-tours.fr/4DACTION/WFICHEWEB/isiteAA> .\n" +
                    "  OPTIONAL\n" + 
                    "  { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}\n" + 
                    "  }";

            String q3 = "SELECT DISTINCT  ?this ?thisLabel\n" + 
                    "WHERE\n" + 
                    "  { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?Label1 ;\n" + 
                    "           a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site>\n" + 
                    "    FILTER regex(?Label1, \"tour\", \"i\")\n" + 
                    "    OPTIONAL\n" + 
                    "      { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}\n" + 
                    "  }";

            // tous les mobiliers dans un site découvert par Elisabeth Lorans
            String q4 = "SELECT DISTINCT  ?this ?thisLabel\n" + 
                    "WHERE\n" + 
                    "  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object> .\n" + 
                    "    ?this (<http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by>/<http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within>)|<http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in> ?Site1 .\n" + 
                    "    ?Site1  a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site> .\n" + 
                    "    ?Site1 <http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed>/<http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by> <https://halshs.archives-ouvertes.fr/search/index/q/*/contributorId_i/103825/> .\n" + 
                    "    OPTIONAL\n" + 
                    "      { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}\n" + 
                    "  }";

            // site de Elisabeth Lorans dans lesquels des sepultures ont été trouvées
            String q5 = "SELECT DISTINCT  ?this ?thisLabel\n" + 
                    "FROM NAMED <http://openarchaeo.huma-num.fr/federation/sources/arsol>\n" + 
                    "WHERE\n" + 
                    "  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site> .\n" + 
                    "    ?this <http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed>/<http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by> <https://halshs.archives-ouvertes.fr/search/index/q/*/contributorId_i/103825/> .\n" + 
                    "    ?this <http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed>/<http://www.ics.forth.gr/isl/CRMsci/O19_has_found_object> ?Sepulture2 .\n" + 
                    "    ?Sepulture2  a                  <http://www.cidoc-crm.org/cidoc-crm/E25_Man-Made_Feature> ;\n" + 
                    "              <http://www.cidoc-crm.org/cidoc-crm/P2_has_type>  <https://ark.frantiq.fr/ark:/26678/pcrt795b632nWw>\n" + 
                    "    OPTIONAL\n" + 
                    "      { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}\n" + 
                    "  }";

            // tous les sites avec un mobilier datant de l'antiquité romaine

            String qX = "SELECT DISTINCT  ?this ?latitude\n" + 
                    "WHERE\n" + 
                    "  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object> .\n" + 
                    "    ?this <http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in>|(<http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by>/<http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within>) <http://arsol.univ-tours.fr/4DACTION/WFICHEWEB/isiteAA> .\n" +
                    "    ?this (<http://www.cidoc-crm.org/cidoc-crm/P53_has_former_or_current_location>|(<http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by>/<http://www.ics.forth.gr/isl/CRMsci/S19_Encounter_Event>/<http://www.cidoc-crm.org/cidoc-crm/P53_has_former_or_current_location>))/<http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?latitude ."+
                    "  }";

            execute(q1);

            MonitoringUtil.printMonitoringInformation();

        } finally {
            FederationManager.getInstance().shutDown();
        }

        System.out.println("Done.");
        System.exit(0);
    }

    public void execute(String q) {
        System.out.println(q);      
        TupleQuery query = QueryManager.prepareTupleQuery(q);
        TupleQueryResult res = query.evaluate();

        while (res.hasNext()) {
            System.out.println(res.next());
        }
    }

    public static void main(String[] args) throws Exception {
        OpenArchaeoFedXTest me = new OpenArchaeoFedXTest();
        me.test();
    }

}
aschwarte10 commented 5 years ago

It is unclear to me currently in which case the underlying connections get closed, such that you get to the observed state. When I find some spare time I will try to dig into this further and try to understand the cause.

aschwarte10 commented 5 years ago

@tfrancart just a quick heads up: I am currently preparing test infrastructure to simulate errorneous server communications. With these tests I identified some smaller issues already (similar to what you have seen), but I have not been able to reproduce your observation yet. WIll continue to look into this and keep you posted

aschwarte10 commented 5 years ago

@tfrancart I have added quite a number of changes to make the error handling (especially in case of concurrent subquery evaluation) more consistent. Hopefully this also improves above situation, and at least gives a more precise cause for the error.

Could you by chance try a new snapshot build from latest master in your environment and let me know about your observations?

aschwarte10 commented 5 years ago

Quite a number of improvements have been added to the FedX code base which should also improve above behavior.

I suggest closing this issue as I do not see concrete tasks any more.

@tfrancart please feel free to re-open or file new issues if your concrete problem still exists