SolidBench / rdf-dataset-fragmenter.js

Fragments an RDF dataset into multiple parts
MIT License
3 stars 8 forks source link

A fragmentation strategy to generate shape indexes #22

Open constraintAutomaton opened 4 months ago

constraintAutomaton commented 4 months ago

This PR is a fragmentation strategy to generate a shape index file in each pod and a shape associated with each resource type define in a son file. The strategy is inspired by the FragmentationStrategySubject it gets the iri associated with the quad in the same way and generates the shape and index based on the iri.

A strategy to ignore certain quads has been added called FragmentationStrategyProbabilityQuadHandling. It's purpose for my use case is to simulate an environment where some containers doesn't provide shape information. The strategy can be configure to work on triple or on resource type.

The PR include :

Future work:

rubensworks commented 4 months ago

@constraintAutomaton Before I review, am I correct to assume that your PR does not require anything for @surilindur's PR (https://github.com/SolidBench/rdf-dataset-fragmenter.js/pull/18) ?

coveralls commented 4 months ago

Pull Request Test Coverage Report for Build 9777846474

Details


Totals Coverage Status
Change from base Build 9755783737: 0.0%
Covered Lines: 1051
Relevant Lines: 1051

💛 - Coveralls
constraintAutomaton commented 4 months ago

@constraintAutomaton Before I review, am I correct to assume that your PR does not require anything for @surilindur's PR (#18) ?

I'm pretty sure, we use the "same approach" but it doesn't depend on one another.

rubensworks commented 4 months ago

Ok good. @surilindur can you confirm? (so we're definitely aligned)

surilindur commented 4 months ago

Yes, this is completely unrelated work. The reason I wanted to wait with the other PR was to see if @constraintAutomaton can implement his approach this way. If that would not have been possible, then the other PR should also probably have been split into another library because they do similar things.

constraintAutomaton commented 4 months ago

The probabilistic generation has been moved into a new strategy. Maybe it should be in another component. Also it has two mode one by iri an another by resource type. Maybe this could be split into two strategies, if it should then maybe another directory of strategy wrapper could be created with those two strategies and the FragmentationStrategyException and the FragmentationStrategyComposite

constraintAutomaton commented 3 months ago

I want to make the shape expression IRI instead of being a blank node has it is right now hence why I turned it into a draft. It should be done quickly (probably tomorrow).

constraintAutomaton commented 3 months ago

I think I prefer finishing all my experiments before merging this.

constraintAutomaton commented 4 days ago

I think the PR can be reviewed @rubensworks . A foreseen change would be to have shapes with "dynamic" constraint based on the fragmentation of the data (by date, by country, ect...). But I think it can be in another PR if need be for that feature.

This feature could also be something interesting

"Create a pod with all the shapes so that shapes can easily refer to other shapes"

But I also think it can be in another PR.