comunica / comunica-feature-link-traversal

📬 Comunica packages for link traversal-based query execution
Other
8 stars 11 forks source link

Re-executing a fully cached query is slow #45

Open rubensworks opened 2 years ago

rubensworks commented 2 years ago

Issue type:


Description:

When executing the following query from cache (execute twice), the execution time is still more than 10 seconds, even though all files are available in memory:

https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-default/#datasources=https%3A%2F%2Fdrive.verborgh.org%2Fmovies%2F&query=PREFIX%20schema%3A%20%3Chttps%3A%2F%2Fschema.org%2F%3E%0ASELECT%20*%20WHERE%20%7B%0A%20%20%3Fmovie%20a%20schema%3AMovie.%0A%20%20%3Faction%20a%20schema%3AWatchAction%3B%0A%20%20%20%20%20%20%20%20%20%20schema%3Aobject%20%3Fmovie.%0A%7D&solidIdp=https%3A%2F%2Fdrive.verborgh.org%2F

This is most probably related to a sub-optimal query plan, as the following query with a single triple pattern does not have this problem:

https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-default/#datasources=https%3A%2F%2Fdrive.verborgh.org%2Fmovies%2F&query=PREFIX%20schema%3A%20%3Chttps%3A%2F%2Fschema.org%2F%3E%0ASELECT%20*%20WHERE%20%7B%0A%20%20%3Fmovie%20a%20schema%3AMovie.%0A%7D&solidIdp=https%3A%2F%2Fdrive.verborgh.org%2F

This is probably due to the zero-knowledge join order actor: https://github.com/comunica/comunica-feature-link-traversal/tree/master/packages/actor-rdf-join-entries-sort-traversal-zero-knowledge Either we should change this actor, or we should use a different actor when we do have knowledge about the underlying data (e.g. when it's cached).

github-actions[bot] commented 2 years ago

Thanks for the suggestion!