Open pbonte opened 1 year ago
@s-minoo this feels related to your rmlstreamer benchmark, can you give some pointers?
Yes! I wrote a data streamer/replayer in rust which will consume the historical data, and replay them with workload characteristics: periodic burst, constant rate, etc...
If you want to apply it to this challenge, take a look at the two traits of the datastreamer-rust: 1) Publisher: Responsible for inducing the data stream characteristics: periodic burst, constant rate, etc... 2) Processor: Responsible for parsing the historical data and appending timestamps to the records
Of course, with current implementation, there's quite a few limits that I can think of right now:
publisher
and the processor
traits) Hi, shall certainly have a deeper look into what you've done to see whether it can aligned with the goals of Challenge 82/83.
Loading from pre-configured path currently supported.
Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!
Loading of large datasets now supported. Time-based sorting of measurements in large datasets now supported. One-step replay now supported.
@svrstich Great! What is still missing?
What's still missing? ;-) Bulk replay, tunable bulk-size replay, tunable speed replay, etc.
@svrstich Do you have a complete list? So that we can assess a bit better what still needs to happen?
Not really, but we'll be limiting it for the time being to two/three features.
We'll stick with some example parameters for now: step-wise and everything. Other use-case specific replay option can be added later on.
@svrstich Is this the solution for the challenge, a pointer, or work in progress?
First alpha release :-)
Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!
@RubenVerborgh Why did you remove "completion: pending" label? Stijn says that it's ready for review.
My bad; I misunderstood what was written above!
Ok, no problem! I assigned it to you now for review.
Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!
Pitch
As many applications are using data streams, a way to easily replay captured streams is necessary for demoing purposes, scalability testing or data recovery. Replaying allows to mimic the real-time behaviour of data streams, even though the data is historical data. A replayer is a crucial component to showcase how our solutions can process live data streams and how they can handle different data rates. The DAHCC dataset will be used as example to replay the data.
Desired solution
The replayer should be able to read a number of files, and stream out the events described in each file. To facility performance and scalability testing, the rates and number of streams should be configurable.
Acceptance criteria
The replayer should be a library that allows to:
Scenarios
This is part of a larger scenario