TREEcg / bucketizers

Bucketize Linked Data Event Stream members
MIT License
3 stars 2 forks source link

What should happen when the property path didn’t resolve? #6

Open pietercolpaert opened 2 years ago

pietercolpaert commented 2 years ago

Current situation:

When the property path doesn’t resolve for a member it throws an error

To-be situation:

When the property path doesn’t resolve, it should use an approach similar to the basic bucketizer for bucketizing all the members that cannot be bucketized by this bucketizer

Test-case: https://github.com/pietercolpaert/coghent-substrings/runs/4107613037?check_suite_focus=true

ddvlanck commented 2 years ago

Agreed that we should have a fallback mechanism.

Looking at the substring bucketizer, in addition to the other pages, the root page will also contain a link to a page (e.g. 0.ttl) that contains members that could not be bucketized? And if that page (0.ttl) is full, we create another page 1.ttl for example. Does root.ttl then contain a link to 1.ttl or should it be 0.ttl?

And what with the subject page bucketizer? Each page is independent of the others, and basically, no relations exist between these pages. So what could be the strategy here?

KasperZutterman commented 2 years ago

Discussed in https://github.com/TREEcg/bucketizers/discussions/11

Originally posted by **ddvlanck** November 22, 2021 At this moment, whenever a bucketizer fails to extract a property path from an LDES member, it throws an error and the workflow it is being used in, stops working. The desired behaviour should be that whenever the above situation happens, the LDES member is still assigned a bucket, but one that is different from the other buckets. The root file should then contain relations to the pages generated by the fallback mechanism. Questions to be answered - Naming convention for the pages generated by the fallback mechanism - What if there is no root? (e.g. subject page bucketizer does not generate a root file)