ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
46 stars 23 forks source link

Raise an error, warning, or ignore duplicate ID's? #165

Closed kinow closed 11 months ago

kinow commented 11 months ago

Hi, I just noticed I had a duplicate ID in my crate, is that valid for RO-Crate?

I thought about checking something like if File(data).id in crate.data_entities, but I think what identifies uniquely data in the crate are ID + JSON-LD data. As I have the date that the entity was created in the JSON-LD data, whenever I create two entities they are never identical.

simleo commented 11 months ago

The spec says: "the RO-Crate Metadata JSON @graph MUST NOT list multiple entities with the same @id; behaviour of consumers of an RO-Crate encountering multiple entities with the same @id is undefined". In ro-crate-py, if you add an entity with the same @id of an entity that's already in the crate, the old one is overwritten. The problem is that, for historical reasons, the default_entities, data_entities and contextual_entities attributes were independent from the internal dictionary that actually stores entities (__entity_map), so duplicates were possible. I fixed that in #166.

kinow commented 11 months ago

Thank you @simleo !