comunica / comunica-feature-link-traversal

📬 Comunica packages for link traversal-based query execution
Other
8 stars 11 forks source link

keep triple provenance as named graphs #123

Open pchampin opened 7 months ago

pchampin commented 7 months ago

Issue type:


Description:

Currently, there is no way to know from which source the link traversal retrieved a given triple. I would like, for example, to be able to ask the following query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT * {
    <https://champin.net/#pa> foaf:knows ?p.
    GRAPH ?g { ?p foaf:name ?name }
}

to determine whether the name of a person comes from their own profile or another source.

Of course, I would expect the default graph to be, by default, the merge of all named graphs, so that "flat" queries still work as expected.

cc @lecoqlibre @FabienGandon

rubensworks commented 7 months ago

Hi @pchampin 👋

The functionality you are describing is available in this actor: https://github.com/comunica/comunica-feature-link-traversal/tree/master/packages/actor-rdf-resolve-hypermedia-links-traverse-annotate-source-graph

It's not part of the default configuration, but a separate one, which has a corresponding web client here: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-prov-sources/

We haven't done any experiments with it so far, so we don't know at the moment how much overhead the implementation causes.

There may also be some alternative approaches possible to achieve triple provenance, such as the quoted triples from RDF-star. (this has been on hold for a while, but now that Comunica supports RDF-star, we could theoretically start building such an implementation)

pchampin commented 7 months ago

Great, thanks @rubensworks .

Is there a way to use the command-line tool with this specific configuration file ? (I tried the -c flag, but it does not seem to work...).

rubensworks commented 7 months ago

Is there a way to use the command-line tool with this specific configuration file ? (I tried the -c flag, but it does not seem to work...).

That should be possibly using the dynamic variant of the CLI tool (I suspect comunica-dynamic-sparql-link-traversal-solid in your case) and setting the COMUNICA_CONFIG envir variable.

pchampin commented 7 months ago

Thanks again @rubensworks but I had no luck with the config file. Below is the command line I used:

COMUNICA_CONFIG=config-solid-prov-sources.json \
    my-comunica \
    "PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?g { <https://champin.net/#pa> foaf:knows ?p. GRAPH ?g { ?p foaf:name ?name } }" \
    --lenient \
    -l debug 2>/tmp/comunica-log

Note that my-comunica is an alias to comunica-dynamic-sparql-link-traversal-solid.

I get no result. While when I remove the GRAPH ?g clause around the 2nd triple, I do get results. So the triples are retrieved, but not put in named graphs as I was expecting...

I tested it with the version installed from NPM (2.10.1) or with the version built from the master branch ( bb3fa62).

rubensworks commented 7 months ago

@pchampin Could you try again with the flag --unionDefaultGraph?

It seems to be working here with this query: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-prov-sources/#transientDatasources=https%3A%2F%2Fwww.rubensworks.net%2F&query=SELECT%20DISTINCT%20*%20WHERE%20%7B%0A%20%20%20%20GRAPH%20%3Fsource%20%7B%0A%20%20%20%20%20%20%3Fperson%20foaf%3Aname%20%3Fname.%0A%09%7D%0A%7D However, it looks like some results have an empty graph binding, so the implementation probably has some issues still. (it's quite old, so things may have broken with more recent changes)

pchampin commented 7 months ago

I did try with --unionDefaultGraph already, and yes, it provides results, but for the wrong reason... In fact, even with the default configuration AND the --unionDefaultGraph option, I get exactly the same result (with an empty IRI bound to ?g).

My understanding is that, when --unionDefaultGraph is on, the default graph is a read-only view, so simple triples (as opposed to quads) are added in the graph named <> (empty IRI). If anything, the results we get when turning on this option shows that the 'annotate-source-graph' actor fails to add the triples in the right named graph...

rubensworks commented 7 months ago

Ok, thanks for checking. So something is definitely going wrong in the 'annotate-source-graph' actor then...

pchampin commented 6 months ago

maybe things have changed since 2 weeks ago, but I now realize that your example above does provide some named graphs after a bunch for empty named graphs!

I can't reproduce this on the command line, though :-(