comunica / comunica-feature-link-traversal

📬 Comunica packages for link traversal-based query execution
Other
8 stars 11 forks source link

`@comunica/query-sparql-link-traversal-solid` does not work when querying someone elses public Pod #57

Closed Maximvdw closed 2 years ago

Maximvdw commented 2 years ago

Queries to public Solid Pods using @comunica/query-sparql-link-traversal will work perfectly fine.

The example can be easily tested using: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/default/

with the default "common friends of ..." query.

Issue Using the exact same query with the exact same pod when using @comunica/query-sparql-link-traversal-solid will result in no results.

The example can be easily tested using: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-default/

The default query "Common friends..." will not return any results. Regardless on being logged in or not (under the condition that you are not logged in as the owner of the dataset). As the dataset is public as can be verified from the /default/ example, it is weird that no results are returned with /solid-default/

Expected behaviour Intuitively I would expect @comunica/query-sparql-link-traversal-solid to behave like @comunica/query-sparql-link-traversal with the option to use the session of a logged in user to access private containers.

rubensworks commented 2 years ago

Yeah, the solid-specific config uses some different algorithms for traversal. See https://github.com/comunica/comunica-feature-link-traversal/blob/master/engines/config-query-sparql-link-traversal/config/config-default.json vs https://github.com/comunica/comunica-feature-link-traversal/blob/master/engines/config-query-sparql-link-traversal/config/config-solid-default.json

Everything is still very much in flux to figure out what techniques are useful, and what needs to be the "default".

Happy to hear suggestions though.

Maximvdw commented 2 years ago

Just to sync my thoughts - what is the reasoning behind the different algorithms instead of using the fetch function from the session to do the traversal with lenient mode?

rubensworks commented 2 years ago

Just to sync my thoughts - what is the reasoning behind the different algorithms instead of using the fetch function from the session to do the traversal with lenient mode?

The different algorithms (can) also use the authenticated fetch (with lenient mode).

The problem with link traversal is that there's a large number of links that could be followed, and the difficulty lies in the question of what links to follow, and in what order, because this has a huge impact on query performance.

(I recently wrote a blog post about this, in case you're interested in this area: https://www.rubensworks.net/blog/2022/01/21/querying-a-decentralized-web/#the-problems-of-link-traversal)

Maximvdw commented 2 years ago

So I assume the idea behind the Solid specific implementation is that you use it to query information 'relevant' to the Pod and do not traverse too much outside the Pod itself (which at the time when you query someone elses Pod would be an issue).

I will close the issue as https://github.com/comunica/comunica-feature-link-traversal/issues/52 most likely covers this topic. For those interested; for my personal use case the default link traversal config with a solid base for the fetch from solid seems to do the trick for logged in and logged out scenarios without too much performance issues.

{
    "@context": [
      "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql/^2.0.0/components/context.jsonld",
      "https://linkedsoftwaredependencies.org/bundles/npm/@comunica/config-query-sparql-link-traversal/^0.0.0/components/context.jsonld"
    ],
    "import": [
      "ccqslt:config/config-solid-base.json",
      "ccqslt:config/extract-links/actors/content-policies-conditional.json",
      "ccqslt:config/extract-links/actors/quad-pattern-query.json",
      "ccqslt:config/rdf-resolve-hypermedia-links/actors/traverse-replace-conditional.json",
      "ccqslt:config/rdf-resolve-hypermedia-links-queue/actors/wrapper-limit-count.json"
    ]
}
rubensworks commented 2 years ago

So I assume the idea behind the Solid specific implementation is that you use it to query information 'relevant' to the Pod and do not traverse too much outside the Pod itself (which at the time when you query someone elses Pod would be an issue).

Yes, that is the idea (for the moment at least, might change in the future).