SolidBench / rdf-dataset-fragmenter.js

Fragments an RDF dataset into multiple parts
MIT License
3 stars 8 forks source link

Comments state that they are fragmented with all strategies #25

Closed jitsedesmet closed 3 months ago

jitsedesmet commented 3 months ago

The issue:

When you run the default config, the /profile/card#me of pods containing documents states that the commentsFragmentation is anything like so: image

This is not the case, as each pod does choose to fragment in a certain way.

In the code:

After quite some debugging to find the culprit, I found that QuadTransformerCompositeVaryingResource has some strange behavior.

In particular, the transformation of quads that are generated by a previous transformation: https://github.com/SolidBench/rdf-dataset-fragmenter.js/blob/master/lib/transform/QuadTransformerCompositeVaryingResource.ts#L80-L91

For a post, this is no problem, but, for comments there is. A comment will generate a tuple like: <http://localhost:3000/pods/00000006597069768211/comments/2010-11-04#343597384121> <http://localhost:3000/www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/replyOf> <"http://localhost:3000/www.ldbc.eu/ldbc_socialnet/1.0/data/comm00000000343597384120"> The object of this tuple is a different one, and thus, the transformer is different. The code now transforms all quads using that new transformer by running: https://github.com/SolidBench/rdf-dataset-fragmenter.js/blob/master/lib/transform/QuadTransformerCompositeVaryingResource.ts#L87-L88 as instructed by: https://github.com/SolidBench/rdf-dataset-fragmenter.js/blob/master/lib/transform/identifier/ResourceIdentifier.ts#L108-L112 . Since a different transformer runs over the same tuples, we generate the unwanted tuples.

Discussion

I'm confused by the behavior, but I suspect it's required somewhere, since it seems to be thought out.

I wonder why you would want to transform the quads generated by the sub-transformers through the same sub-transformer? An easy fix to this problem would be to just not do that, but since the whole structure is quite complex, I suspect this is done for a reason?

jitsedesmet commented 3 months ago

Somewhat related to this. I think each pod currently has the same PostFragmentation as they have CommentsFragmentation. I think this because of this code: https://github.com/SolidBench/rdf-dataset-fragmenter.js/blob/master/lib/transform/QuadTransformerCompositeVaryingResource.ts#L64-L70

Is this the desired outcome? I feel like making the two fragmentations unrelated will increase the complexity of pods, which might be beneficial?

rubensworks commented 3 months ago

Each pod should indeed be marked with a single comments/posts fragmentation strategy, so this looks like a bug indeed.