Currently RiverBench only includes datasets that can be viewed as a sequence of RDF graphs or RDF datasets (grouped RDF streams). This is because I first and foremost wanted to focus on the most lacking benchmarks, and grouped RDF streams were just that.
However, every dataset in RiverBench can be viewed as a flat RDF stream (sequence of triples or quads). We could theoretically allow submitting also flat-only datasets that cannot be split into elements, which would broaden the set of tasks to which RiverBench is applicable.
This would require changes in a few places:
Change the schema for dataset metadata.
Update dataset requirements in the documentation and the description of the dataset submission process.
Change the dataset proposal form.
Update the ci-worker to handle such datasets correctly (new mode of dataset loading). This will also require quite a few changes in the Pekko streams used to process the datasets.
Make sure that the category/profile/task system is ready for this. Flat profiles should pick up the new datasets, grouped ones should not. Note that Jelly distributions are available for both types of datasets, we must be careful to avoid confusion there.
Currently RiverBench only includes datasets that can be viewed as a sequence of RDF graphs or RDF datasets (grouped RDF streams). This is because I first and foremost wanted to focus on the most lacking benchmarks, and grouped RDF streams were just that.
However, every dataset in RiverBench can be viewed as a flat RDF stream (sequence of triples or quads). We could theoretically allow submitting also flat-only datasets that cannot be split into elements, which would broaden the set of tasks to which RiverBench is applicable.
This would require changes in a few places: