SolidLabResearch / Challenges

24 stars 0 forks source link

Intermediate result sharing among data streams for aggregation #104

Open argahsuknesib opened 1 year ago

argahsuknesib commented 1 year ago

Pitch

This challenge is an extension of the challenge on query containment. Upon the completion of the query containment challenge, we will have an algorithm to determine if the registered query is contained in an already registered query. To improve the scalability of the solid stream aggregator, it is crucial to share resources between the streams. The queries are similar but different queries over the data.

Desired solution

The desired solution is an algorithm / approach to use the similarities in the queries over multiple streams. In streaming scenarios, the data stream is chopped up into a particular window for processing over. Therefore, the common data over which the queries differs on the size of the window over the two queries. The approach of sharing should be able to share resources over the following scenarios:

Window Queries
Same Different
Different Same
Different Different

Acceptance criteria

The acceptance criteria for this challenge is to implement the sharing of resources between the streams in the solid stream aggregator and show the improvement in query execution time when comparing the execution time of the queries with and without sharing of resources. The data set used for the evaluation is the DAHCC dataset.

Pointers

As the topic of aggregation is still a novel research topic, a number of assumptions were taken:

Scenarios

The challenge is part of a larger scenario on Aggregated view on sensitive personal health data streams. The scenario is described in issue 16

pheyvaer commented 1 year ago

@pbonte Once you are doing with the review of the challenge, can you assign it to me? Thanks!