Determining query containment for the registered queries to improve the scalability of the solid stream aggregator.

argahsuknesib commented 1 year ago

Pitch

This challenge is an extension of the challenge 84 and part of the scenario 16. The solid stream aggregator enables a query agent to maintain a continuous view of the stream stored in the solid pod by registering a query. In the scenario, there can be multiple query clients requesting a continuous view of the stream. A naive approach would be to execute each and every query registered by the query agent. However, this approach is not scalable. As the queries to be processed by the aggregator are similar but different queries over common data, it is vital to find the similarities in the queries and execute only the unique queries to improve the scalability of the aggregator. We will use the DAHCC dataset and the solid stream aggregator to test employ the query containment algorithm.

Desired solution

The desired solution is to implement a query containment algorithm to determine which queries are contained in other already registered queries. The query containment algorithm should be able to determine the containment of the queries registered in the RSP-QL syntax. The RSP-QL syntax can be simplified to SPARQL syntax by removing the expressivity required for stream based queries such as window, step, range etc. Therefore, the query containment algorithm should also be able to work with SPARQL queries. The developed algorithm should be able to assist in managing multiple views in the solid project.

Acceptance criteria

To employ the developed query containment algorithm in the query registry of the solid stream aggregator to determine if a newly registered query by a query agent is contained in already registred or executed queries of the query registry.

Pointers

As the topic of aggregation is still a novel research topic, a number of assumptions were taken:

Long term server-side authenticated sessions have been solved and therefore the authentication part of this challenge is not taken into account.
The containment problem is undecidable over the full SPARQL syntax. Therefore, only a part of the SPARQL syntax is considered.
The registered queries are in either in RSP-QL syntax or are SPARQL SELECT queries.

Scenarios

The challenge is part of a larger scenario on Aggregated view on sensitive personal health data streams. The scenario is described in issue 16

rubensworks commented 1 year ago

Query containment is also needed for mapping queries to indexes such as shapetrees and type indexes, so I'm very interested in this!

pheyvaer commented 1 year ago

@pbonte Once you are doing with the review of the challenge, can you assign it to me? Thanks!

SolidLabResearch / Challenges