ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
49 stars 26 forks source link

Store refs to Entity instances #128

Closed simleo closed 2 years ago

simleo commented 2 years ago

Not for merging: creating this so some trace remains of this exploration.

Entity objects store their JSON representation in the _jsonld attribute, returned by the properties() method. We have __getitem__ and __setitem__ overrides that automatically resolve references to other entities so that we can do:

alice = crate.add(Person(crate, "#alice"))
bob = crate.add(Person(crate, "#bob"))
crate.root_dataset["author"] = [alice, bob]
for a in crate.root_dataset["author"]:
    print(a.id)
#alice
#bob

However, since __getitem__ creates a new object on the fly, the following does nothing:

trudy = crate.add(Person(crate, "#trudy"))
crate.root_dataset["author"].append(trudy)

In this PR, I've explored a possible way to make the above work. I've changed the implementation of Entity to make it store actual references to other Entity instances, and generate the JSON representation on the fly in properties(). As can be seen by the changes I had to make to other parts of the code, this makes things easier when creating a crate in memory and serializing it to disk, but it also makes them harder when the crate is read from disk and updated. This is due to the fact that entities can refer to each other, which made me add the __resolve_references method to the ROCrate object and call it at the end of the reading process. That's not the end of the story though, since users might try to add a new entity with:

crate.add(Person(crate, "#carlos", properties={
    "funder": {"@id": "#alice"}
}))

So we should either recommend not to do this or explain that this calls for more references resolving.

Moreover, this would still not cover the common use case where one wants to append values to a property that has 0 (property is absent) or 1 (value is not a list) values, so I decided to abandon this route and add the append_to method instead:

crate.root_dataset.append_to("author", trudy)