Wimmics / corese

Software platform implementing and extending the standards of the Semantic Web.
https://project.inria.fr/corese/
Other
102 stars 29 forks source link

Distributed query process is (or seems to be) missing #138

Closed tokarenko closed 11 months ago

tokarenko commented 1 year ago

Issue Description:

Distributed query feature is (or seems to be) missing from the latest corese distribution. If corese has abandoned this feature, I would be grateful if you could suggest a decent engine for SPARQL federation in transparent manner: without SERVICE subqueries.

Bug Details:

There are no GUI controls presented in http://sparks.i3s.unice.fr/public:kgram_dqp_alban_gaignard and I am unable to find corresponding REST endpoints.

Steps to Reproduce:

  1. Roll out corese from latest release.
  2. Try to use distributed query process according to documentation.

Expected Behavior:

Distributed querying is available and usable as described in documentation.

Actual Behavior:

Distributed query is (or seems to be) missing.

Note to Developers:

None

Screenshots/Attachments:

None

remiceres commented 1 year ago

Hello,

Thank you for reaching out! The distributed query feature isn’t present in its original form, but you can achieve similar results using federated queries. Here are some quick steps:

In Code:

QueryProcess exec = QueryProcess.create(Graph.create());
Mappings map = exec.query("@federate <uri1endpoint1> <uri2endpoint2>\nselect * where {?x ?p ?y}\n");
// Print the list of results
for (Mapping m : map) {
    System.out.println(m);
}

In Corese-GUI:

Run:

@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}

You can also define and reuse a Federation:

@federation <federationuri> <uri1endpoint1> <uri2endpoint2>

@federation <federationuri>
select * where {?x ?p ?y}

You can also get the provenance of the results:

Add the @provenance keyword to the query:

@federate <uri1endpoint1> <uri2endpoint2>
@provenance
select * where {?x ?p ?y}

Documentation:

For additional information and examples, please refer to the documentation at Corese documentation and Federated Query.

Let me know if you have more questions.

Best

tokarenko commented 1 year ago

Thank you @remiceres for your quick reply! Your answer made federation usage clearer to me. Please help me with the following further questions: 1) Does corese provide SPARQL endpoint for federated queries? If not, then should I use the Python wrapper to set it up and how to alleviate potential performance and scalability issues? 2) Is it possible to define federation not in a query but elsewhere in the system to alleviate from clients management of federated URIs?

remiceres commented 1 year ago

Using Federated Query with Corese-Server

Yes, it is possible to use the federated query with Corese-Server. Here's how to do it:

If your server runs locally and is not public

  1. Start the server with the following command:
java -jar corese-server-4.4.1.jar -p 8083 -su 
  1. Send federated queries to http://localhost:8083/sparql (same as before):
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}

If your server runs on a public server

If your server is hosted publicly, you can't use the -su option. Therefore, you need to create a profile to explicitly allow which endpoints can be used.

  1. Create a profile file (for example, profile.ttl) with the following content:
prefix st: <http://ns.inria.fr/sparql-template/> 

# List external endpoints allowed
st:access st:namespace
    <uri1endpoint1>,
    <uri2endpoint2>.
  1. Start the server with the following command:
java -jar corese-server-4.4.1.jar -p 8083 -pp profile.ttl
  1. Send federated queries to http://localhost:8083/sparql (same as before):
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}

Python wrapper

You can attempt to use the Python wrapper, but I wouldn’t recommend it at the moment. Currently, the Python wrapper is just a proof of concept, and I haven’t tested its performance and scalability. We plan to work on its improvement and test scalability in the future.

Defining Federation Outside a Query

Currently, I am unaware of how to define federation outside a query. I will get back to you if I find more information.

Useful links


If you are developing a program in Python and want to use federated queries, I recommend using Corese-Server and sending queries to it through Python code.

If you have more questions, please don't hesitate to ask.

tokarenko commented 1 year ago

Thank you for valuable information! I will try to implement it next week. With regards to defining federation outside a query there is information in this article that seems to be about corese and looks promising:

prefix st:<http://e.g/sparql - template/>
<http://e.g./X/sparql> a st:Federation;
st:definition(
<http://a.b/blazegraph/Y/sparql>
<https://c.d/fuseki/annotation/sparql>
<http://i.j/repositories/sparql>
) .

In the corese source code there is "federation.ttl" file that seems to be the mentioned "dedicated vocabulary": https://github.com/Wimmics/corese/blob/0dc04a14cb19f7a58584153cdf295771837cc4d9/corese-core/src/main/resources/data/corese/federation.ttl

remiceres commented 1 year ago

I have gathered more information on defining federation outside of a query. You can specify a federation in a configuration file and load it when starting Corese. Here are the steps:

Create a federation file, for instance, federation.ttl, and include the following content:

# Define a federations
<http://example.com/federation> a st:Federation ;
    rdfs:label "example" ;
    st:definition (
        <endpoint1> 
        <endpoint2>
    ).

Next, create a configuration file named, for example, config.properties, with the content below:

FEDERATION = /path/to/federation.ttl

Corese-Server:

Launch the server using the command below:

java -jar corese-server-4.4.1.jar -init config.properties

Then, send federated queries to http://localhost:8080/sparql :

@federation <http://example.com/federation>
select * where {
  ?x ?p ?y
}

Corese-GUI:

To start the GUI, use the following command:

java -jar corese-gui-4.4.1.jar -init config.properties

Then, execute the following query:

@federation <http://example.com/federation>
select * where {
  ?x ?p ?y
}

Corese-Command:

Initiate the command using the command below:

echo "" | java -jar corese-command-4.4.2.jar sparql -if turtle -q "@federation <http://example.com/federation> select * where {?x ?p ?y}" --init config.properties

The inclusion of echo "" and the -if turtle options are because the command is not intended to be used without input. It serves as a workaround.

This feature will be incorporated in the upcoming release of Corese-command (4.4.2). If you wish to utilize this feature now, you may compile the current version of Corese-command from the source code in the develop branch.


I am in the process of drafting a documentation page about this feature.

Do not hesitate to reach out if you have further questions.

tokarenko commented 1 year ago

@remiceres , thank you for providing exhaustive information so quickly. I am glad that this valuable feature is going to be released soon. I would be grateful if you could share the following suggestion with the development team. The suggestion is to support SPARQL 1.1 clients that don’t (can’t) add @federation statement to their queries. For example, packaged distributions with SPARQL 1.1 clients that can’t be easily modified to include @federation statement. I suggest to consider configuration option for corese that enables federation by default for all queries.

ocorby commented 1 year ago

Hi, You will find some hints in this doc : [ https://files.inria.fr/corese/doc/service.html | https://files.inria.fr/corese/doc/service.html ] section Federated SPARQL endpoint

If you define a federation like this :

http://myserver.fr/myname/federate a st:Federation ; st:definition (

... ) Then a sparql query sent to the corese sparql endpoint URL : [ http://myserver.inria.fr/myname/federate | http://myserver.fr/myname/federate ] will be processed as a federated query. A federated engine splits and rewrites the query appropriately and computes it with the endpoints of the federation. In addition, there is a set of properties to tune the federated engine, see attached file properties SERVICE_... and FEDERATE_... It is a R&D engine, aka work in progress. Best regards Olivier > De: "Dmitrii" ***@***.***> > À: "Wimmics" ***@***.***> > Cc: "Subscribed" ***@***.***> > Envoyé: Mardi 26 Septembre 2023 20:22:21 > Objet: Re: [Wimmics/corese] Distributed query process is (or seems to be) > missing (Issue #138) > [ https://github.com/remiceres | @remiceres ] , thank you for providing > exhaustive information so quickly. I am glad that this valuable feature is > going to be released soon. I would be grateful if you could share the following > suggestion with the development team. The suggestion is to support SPARQL 1.1 > clients that don’t (can’t) add [ https://github.com/federation | @federation ] > statement to their queries. For example, packaged distributions with SPARQL 1.1 > clients that can’t be easily modified to include [ > https://github.com/federation | @federation ] statement. I suggest to consider > configuration option for corese that enables federation by default for all > queries. > — > Reply to this email directly, [ > https://github.com/Wimmics/corese/issues/138#issuecomment-1736058719 | view it > on GitHub ] , or [ > https://github.com/notifications/unsubscribe-auth/ABKXNJAMCYAGHL24SPPXAK3X4MMN3ANCNFSM6AAAAAA5CVKDSE > | unsubscribe ] . > You are receiving this because you are subscribed to this thread. Message ID: > ***@***.***> # # Corese configuration # Property file interpreted by corese.core.util.Property # java -jar corese-gui.jar -init property.properties # java -jar corese-server.jar -init property.properties # Property.load("property.properties"); # Property.set(LOAD_IN_DEFAULT_GRAPH, true); # Property.init(graph); # VARIABLE = vis=fr.inria.corese.core.visitor.solver;home=./;fed=/user/corby/home/AADemoNew/federate;sys=/user/corby/home/AAData/query;db=/user/corby/home/AADemoNew/storage IMPORT = ./gui.properties STORAGE_MODE = dataset #STORAGE_MODE = db #STORAGE_MODE = db_all # manager handle edge index i with kg:rule_i RULE_DATAMANAGER_OPTIMIZE = true # replace kg:rule_i by kg:rule RULE_DATAMANAGER_CLEAN = false # transitive closure rule computed by function RULE_TRANSITIVE_FUNCTION = true # edge iterator filter integer edge index RULE_DATAMANAGER_FILTER_INDEX = true RULE_TRACE = true #STORAGE = jena_tdb1,jenaowl,/user/corby/home/AADemoNew/storage/go STORAGE = jena_tdb1,jenamap,/user/corby/home/AADemoNew/storage/map;java,mapdatamanager,mapdatamanager?path=/user/corby/home/AADemoNew/map/insert.rq¶m=/user/corby/home/AADemoNew/map/map.json&load=/user/corby/home/AADemoNew/map/schema.ttl #;java,loaddm,loaddm?path=/user/corby/home/AADemoNew/db/load.rq;java,mapjson,mapjson?mode=ldscript&path=/user/corby/home/AADemoNew/map/ldscript.rq;jena_tdb1,$db/map,$db/map;jena_tdb1,$db/indexcard,$db/indexcard;jena_tdb1,$db/indexprop,$db/indexprop;jena_tdb1,$db/indexmore,$db/indexmore # ==== USAGE ==== # STORAGE = TYPE_BD1,ID_DB1,PARAM_BD1;TYPE_BD2,ID_DB2,PARAM_BD2 # # Chaque BD est définie de la manière suivante : # – Un type de BD (eg: jena_tdb1, rdf4j_model, corese_graph, java) # – Un ID, identifie la BD dans les requêtes SPARQL # – (Optionel) les paramètres passés au constructeur du DataManager # =============== # ==== EXAMPLE ==== # STORAGE = jena_tdb1,jena,$db_path/music;rdf4j_model,rdf4j;corese_graph,corese # ================= # STORAGE = rdf4j_model=$db/human #$db/human #/tmp/tmp BLANK_NODE = _:b # display ex:test vs DISPLAY_URI_AS_PREFIX = true # rdf star reference node displayed as nested triple DISPLAY_AS_TRIPLE = true # Graph node is instance of IDatatype (one object) or Node(IDatatype) (two objects) GRAPH_NODE_AS_DATATYPE = false # graph ?g { } iterate external named graph EXTERNAL_NAMED_GRAPH = true # load in kg:default or in file path as named graph LOAD_IN_DEFAULT_GRAPH = true # skolemize bnode as URI SKOLEMIZE = false GRAPH_INDEX_END = true # run corese with rdf* prototype extension RDF_STAR = false RDF_STAR_TRIPLE = false # select target nested triple for asserted triple pattern RDF_STAR_SELECT = false # physically delete triple with reference RDF_STAR_DELETE = false # clean OWL graph before OWL RL using update queries OWL_CLEAN = true # constraint rule entailment in kg:constraint named graph CONSTRAINT_NAMED_GRAPH = true # constraint rule entailment in external kg:constraint named graph CONSTRAINT_GRAPH = true # Specific processing of transitive rule RULE_TRANSITIVE_OPTIMIZE = true # additional queries for cleaning OWL #OWL_CLEAN_QUERY = /user/corby/home/AAData/query/clean/test.rq # user defined OWL RL rule base #OWL_RL = /user/corby/home/AAData/rule/owlrl.rul # when true: distinct decimal and integer, distinct string and literal, ... # used for w3c test case compliance SPARQL_COMPLIANT = false # enable update during query for micro services REENTRANT_QUERY = false # rdf triples may be assigned access right ACCESS_RIGHT = false # specify user access level #ACCESS_LEVEL = PUBLIC | RESTRICTED | PRIVATE # corese trigger events that run ldscript functions EVENT = false # Visitor for trace #RULE_VISITOR = $vis.QuerySolverVisitorRuleUser #SOLVER_VISITOR = $vis.QuerySolverVisitorUser #TRANSFORMER_VISITOR = $vis.QuerySolverVisitorTransformerUser #SERVER_VISITOR = fr.inria.corese.server.webservice.QuerySolverVisitorServerUser # # Test, debug # VERBOSE = false SOLVER_DEBUG = false TRANSFORMER_DEBUG = false TRACE_MEMORY = false LOG_NODE_INDEX = false LOG_RULE_CLEAN = false # draft: trace var in owl rl checker: trace_sttl_undo=true LDSCRIPT_VARIABLE = mapsize=maplarge;mapzoom=6 # generic property, not used INTERPRETER_TEST = false TRACE_GENERIC = false # take property cardinality into account to sort query pattern SOLVER_SORT_CARDINALITY = false # see fr.inria.corese.sparql.triple.function.term.TermEval term evaluator overload SOLVER_OVERLOAD = false # enable advanced prototype query planner (todo) # std | advanced SOLVER_QUERY_PLAN = std LDSCRIPT_DEBUG = false # check xsd datatype of arguments at function call LDSCRIPT_CHECK_DATATYPE = false # check rdf:type of arguments at function call LDSCRIPT_CHECK_RDFTYPE = false # ldscript function max number of parameters FUNCTION_PARAMETER_MAX = 15 # values filter SERVICE_BINDING = values # split variable bindings SERVICE_SLICE = 500 # limit added to service unless service has a limit #SERVICE_LIMIT = 5000 SERVICE_TIMEOUT = 5000 # add parameter to service url #SERVICE_PARAMETER = mode=link;debug;show&transform=st:xml&format=json # service may return RDF graph as result # when true: execute service query locally on this graph SERVICE_GRAPH = false # service parameter sent to endpoint SERVICE_SEND_PARAMETER = true # generate service evaluation report SERVICE_REPORT = false # max number of results displayed in debug/trace/log mode SERVICE_DISPLAY_RESULT = 10 # when there is a parse error SERVICE_DISPLAY_MESSAGE = true # service http header recorded in log and displayed by logger #SERVICE_HEADER = * SERVICE_HEADER = X-SPARQL-MaxRows;Server;Content-Type # define federation for federated query #FEDERATION = /user/corby/home/AAData/data/corese/federation.ttl # generate partition of connected bgp FEDERATE_BGP = true # do not split complete partition FEDERATE_PARTITION = true # complete with triple alone FEDERATE_COMPLETE = false # source selection with filter FEDERATE_FILTER = true # filters used in source selection in adition to predefined list FEDERATE_FILTER_ACCEPT = != # reject filters from predefined list FEDERATE_FILTER_REJECT = test # source selection with bind (exists {t1 . t2} as ?b_i) FEDERATE_JOIN = true FEDERATE_JOIN_PATH = true # exploit join on optional ; require FEDERATE_BGP = true FEDERATE_OPTIONAL = true # exploit join on minus ; require FEDERATE_BGP = true FEDERATE_MINUS = true # skip undefined arg of union optional minus FEDERATE_UNDEFINED = true FEDERATE_BLACKLIST = http://ldf.fi/warsa/sparql;http://data.semanticweb.org/sparql;http://biordf.net/sparql FEDERATE_BLACKLIST_EXCEPT = http://corese.inria.fr/sparql;http://prod-dekalog.inria.fr/sparql;https://dbpedia.org/sparql;https://query.wikidata.org/sparql;http://fr.dbpedia.org/sparql # max number of endpoint url returned by source discovery FEDERATE_INDEX_LENGTH = 500 # success rate to accept endpoint url in source discovery FEDERATE_INDEX_SUCCESS = 0.5 # query pattern for source discovery FEDERATE_QUERY_PATTERN = http://prod-dekalog.inria.fr/sparql=$sys/indexpatternendpointall.rq;store:/user/corby/home/AADemoNew/storage/indexcard=$fed/indexpattern/localindexpatternendpointall.rq;http://localhost:8080/index=$fed/testindex/indexpatternendpoint.rq;http://d2kab.fr=$fed/indexpattern/indexqueryd2kab.rq # predicate pattern for source discovery FEDERATE_PREDICATE_PATTERN = http://prod-dekalog.inria.fr/sparql=$sys/indexpredicate.rq;store:/user/corby/home/AADemoNew/storage/indexcard=$fed/indexpattern/localindexpredicate.rq;http://localhost:8080/index=$fed/testindex/indexpredicate.rq;http://d2kab.fr=$fed/indexpattern/indexpatternd2kab.rq; # predicates used to split connected bgp in two subparts FEDERATE_SPLIT = owl:sameAs # predicates to be skipped during source discovery #FEDERATE_INDEX_SKIP = http://rdfs.org/ns/void# # # Dataset # # limit number of triples loaded from any rdf document #LOAD_LIMIT = 100000 # load take ?format=rdfxml into account #LOAD_WITH_PARAMETER = true # header Accept for load http #LOAD_FORMAT = text/turtle;q=1.0, application/rdf+xml;q=0.9, application/ld+json;q=0.7; application/json;q=0.6 #LOAD_FORMAT = application/rdf+xml
tokarenko commented 1 year ago

Thank you, @ocorby ! Now with your advice my requirements should be completely satisfied. I will try to set up the corese and provide feedback.