Closed SteffenBrinckmann closed 2 years ago
Use get_entities
. Example:
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {"@id": "./"},
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}
},
{
"@id": "./",
"@type": "Dataset",
"author": {"@id": "https://orcid.org/0000-0002-1825-0097"}
},
{
"@id": "https://orcid.org/0000-0002-1825-0097",
"@type": "Person"
}
]
}
for e in crate.get_entities():
print((e.id, e.type))
('ro-crate-metadata.json', 'CreativeWork')
('./', 'Dataset')
('https://orcid.org/0000-0002-1825-0097', 'Person')
Sorry for not being clear. I can iterate through the top-level entities as you mentioned. That is a list, which is a special case of graph.
But each entity can have a 'hasPart' property which contains a list of ids for 'sub'-entities, which can then contain even another level of 'hasPart' and so on. That would build a complete graph, potentially.
Is there a method, to iterate through all of those other than implementing an recursive function which might even run into endless loops, if sub-sub-nodes become the parent nodes in a complex graph?
Such a graph can be built for any kind of relationship, not just hasPart
. I think it's best to use a specialized library such as networkx for that. You could try something like this:
from rocrate.rocrate import ROCrate
from rocrate.model.entity import Entity
import networkx as nx
crate = ROCrate("/path/to/crate")
g = nx.DiGraph()
for e in crate.get_entities():
parts = e.get("hasPart")
if not parts:
continue
if not isinstance(parts, list):
parts = [parts]
for p in parts:
if isinstance(p, Entity):
g.add_edge(e.id, p.id)
At this point you can iterate through the nodes via the networkx API. Any time you need to resolve an id back to the entity, just use crate.get
.
Thanks @simleo, for the help.
An related question: am I correct that only the ro-crate top-level is parsed?
Can I force that all entities are parsed? Why is there a '#'-prefixed?
RO-Crate metadata files contain flattened JSON-LD, so everything is top-level (all entities appear directly under @graph
). Identifiers with a leading #
are local to the RO-Crate; the corresponding entities are parsed just like the others.
As for Python type (type(e)
) vs semantic type (e.type
): the ro-crate-py model defines only a small number of specialized types, typically in cases where there is significant functionality associated with them. For instance, for File
entities there is a File
Python class with methods that specify what to do when it's written to disk. Similarly, directories are modeled by Dataset
, which has a corresponding specific Python type. The vast majority of data entities fall into these two groups, so they will have a specific Python type. In most cases, however, The Python type for a contextual entity would just be ContextEntity
. In the past we've explored the possibility of mapping all of Schema.org into the Python class hierarchy, but we abandoned it since it would gain us little while adding a lot of unnecessary complexity. Also note that RO-Crate entities can have multiple types, not necessarily tied by a parent-child relationship, so in general there cannot be a perfect matching between Python and semantic type.
Thank you so much for all the explanations.
Hey, is there an easy way to iterate/walk through the graph in python? As far as I see it in version 0.7: the top-node is parsed and then upon request one can go to a different node. Is there an automatic function to iterate/walk through each node? Thanks, Steffen