ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
46 stars 23 forks source link

BUG: If Root Data Entity has an arcp:// ID the arcp:// part gets stripped off it #167

Closed ptsefton closed 8 months ago

ptsefton commented 8 months ago

Here's some code to reproduce the error:

from rocrate.rocrate import ROCrate
import json
import os
test_crate = {
  "@context": [
    "https://w3id.org/ro/crate/1.1/context",
    {
      "@vocab": "http://schema.org/"
    }
  ],
  "@graph": [
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "conformsTo": [
        {
          "@id": "https://w3id.org/ro/crate/1.2"
        }
      ],
      "about": {
        "@id": "arcp://name,corpus-of-oz-early-english"
      }

    },
    {"@id": "arcp://name,corpus-of-oz-early-english", "@type": "Dataset"}
  ]}
os.makedirs("test_crate", exist_ok=True)
f = open("test_crate/ro-crate-metadata.json","w")
f.write(json.dumps(test_crate, indent=2))
f.close()
crate = ROCrate("test_crate")
print("ID of root dataset", crate.root_dataset.id) # name,corpus-of-oz-early-english 

print("ID of original", test_crate["@graph"][1]["@id"]) # arcp://name,corpus-of-oz-early-english
print("same?", crate.root_dataset.id == test_crate["@graph"][1]["@id"])
simleo commented 8 months ago

Fixed in #168. A workaround for earlier versions is to add a trailing slash to the URL, e.g. arcp://name,corpus-of-oz-early-english/. This was affecting all URLs, not just arcp ones.

Note that the comparison in the code above would still fail, because a trailing slash is automatically added to Dataset ids (to comply with RO-Crate's "SHOULD end with /" recommendation).