SolidLabResearch / Challenges

24 stars 0 forks source link

Storing large real-time data streams in data pods using LDES #82

Closed pbonte closed 1 year ago

pbonte commented 1 year ago

Pitch

Data streams are becoming omnipresent and are a crucial component in many use cases. Storing streams in a low-cost file-based web environment could be done using Linked Data Event Streams (LDES). However, pushing large volumes of high volatile data into a SOLID-based LDES is still not possible due to the way current solution do the partitioning of the data, i.e. after all data has been retrieved, instead of in a streaming fashion. The former crashes the Solid server due to the high load on the server when repartitioning large amounts of data. Data from the DAHCC dataset will be used for purpose, which contains data streams describing the behaviour of various patients. The streams contain over 100.000 events.

Desired solution

A streaming SOLID-based LDES connector that can partition the data in a streaming fashion, i.e. when retrieving the data instead of needing to wait till the whole dataset is received.

Acceptance criteria

A connector is required that investigates the LDES, computes the need for a new bucket and adds the events of the stream either to a new bucket or to an existing one based on the timestamps of these events. In more detail the following functionality is required:

Scenarios

This is part of a larger scenario

svrstich commented 1 year ago

Stream implementation of RDF/JS for loading large datasets is currently in use.

svrstich commented 1 year ago

https://gitlab.ilabt.imec.be/svrstich/ldes-in-solid-semantic-observations-replay

pheyvaer commented 1 year ago

@svrstich Is this a pointer for the challenge or is this something else?

svrstich commented 1 year ago

First alpha release :-)

pheyvaer commented 1 year ago

@svrstich Hmmm, that is weird considering that nobody is working on this challenge. Are you working on this one?

RubenVerborgh commented 1 year ago

I don't have enough info; assigning @pietercolpaert to assess completion.

svrstich commented 1 year ago

I have uploaded a screencast for 82/83 as well: https://github.com/SolidLabResearch/LDES-in-SOLID-Semantic-Observations-Replay/blob/main/README.md#screencast

pheyvaer commented 1 year ago

@svrstich I think we are almost there for this challenge! I see there are still 3 pull requests here. Can you have a look at them? Thanks!

svrstich commented 1 year ago

I agree :-) But the three pull requests are all submitted from my side. I suppose I should not be accepting my own pull requests, do I?

pheyvaer commented 1 year ago

It depends. Who do you expect to review them? Is it @pbonte?

svrstich commented 1 year ago

I would assume so. I'm in the office tomorrow, so I'll see if I can get him to do this.

svrstich commented 1 year ago

I have provided some input for the open question in the https://github.com/SolidLabResearch/Challenges/pull/109

pheyvaer commented 1 year ago

You find the report for this challenge here.