ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
46 stars 23 forks source link

Bug: When ingesting a File entity its @id gets a # suffix #77

Closed ptsefton closed 2 years ago

ptsefton commented 2 years ago

If I ingest the below crate and then inspect the @id then {"@id": "test.csv", "@type": "File"} turns into

{'@id': '#test.csv', '@type': 'File'}

I have tried this with the file in the directory and without - same result.

Using code like this:

crate = ROCrate("./") 
for e in crate.get_entities():
    print(e.as_jsonld())  # JSON entry
{
  "@context": [
    "https://w3id.org/ro/crate/1.1/context",
    {
      "@vocab": "http://schema.org/"
    },
    {
      "@base": null
    }
  ],
  "@graph": [
    {
      "@id": "#collection",
      "@type": "RepositoryCollection ",
      "name": "Test collection"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "hasFile": [{"@id": "test.csv"}],
      "hasPart": [
        {
          "@id": "#collection"
        }
      ],
      "name": "testing hasPart"
    },
    {"@id": "test.csv", "@type": "File"},
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "about": {
        "@id": "./"
      },
      "identifier": "ro-crate-metadata.json"
    }

  ]
}
simleo commented 2 years ago

The spec says data entities MUST be linked from the root via hasPart; hasFile, OTOH, is only mentioned as a term imported from PCDM. When reading a crate, the library adds all items listed in hasPart as data entities, then assumes everything else ("test.csv", in this case) is a contextual entity. Since "test.csv" is not an absolute URI, it adds a # to turn it into a local id (this is done for all contextual entities).

To sum it up, I believe the library is behaving consistently with the spec here. I will close this and perhaps you can open an issue on the ro-crate repo if you think a spec update is necessary.

ptsefton commented 2 years ago

OK, my test case was wrong - I keep getting hasPart and hasFile mixed up. My mistake, sorry!

And yes, that is what the spec says. When I introduced the abstract repository classes from pcdm it was to allow a crate to contain a set of files with an abstract structure that does not necessarily reflect the directory structure - as exported from a database or repository. What this means is that we probably should have said that the pcdm classes are DataEntities that can be chained like Datasets/directories and that hasFile is equivalent to hasPath.