ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
46 stars 23 forks source link

Bug: does not write crates with arcp IDs correctly #175

Closed rosanna-smith closed 5 months ago

rosanna-smith commented 7 months ago

If you have a crate with an arcp identifier on the root dataset, the write method creates an unwanted arcp directory in the export and changes the ID to ./

Here's the code to reproduce this error:

import os
import json
from rocrate.model.person import Person

#input data
input_data = {
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "arcp://name,farms-to-freeways-example-dataset",
            "@type": "Dataset",
            "datePublished": "2024-01-31T04:46:07+00:00"

        },
        {
            "@id": "ro-crate-metadata.json",
            "@type": "CreativeWork",
            "about": {
                "@id": "arcp://name,farms-to-freeways-example-dataset"
            },
            "conformsTo": {
                "@id": "https://w3id.org/ro/crate/1.1"
            }
        },

        {
            "@id": "https://orcid.org/0000-0000-0000-0000",
            "@type": "Person",
            "affiliation": "University of Flatland",
            "name": "Alice Doe"
        },
        {
            "@id": "https://orcid.org/0000-0000-0000-0001",
            "@type": "Person",
            "affiliation": "University of Flatland",
            "name": "Bob Doe"
        }
    ]
}

os.mkdir("input_crate")

with open('input_crate/ro-crate-metadata.json', 'w') as f:
    json.dump(input_data, f)

crate = ROCrate("input_crate")

alice_id = "https://orcid.org/0000-0000-0000-0000"
bob_id = "https://orcid.org/0000-0000-0000-0001"
alice = crate.add(Person(crate, alice_id, properties={
    "name": "Alice Doe",
    "affiliation": "University of Flatland"
}))
bob = crate.add(Person(crate, bob_id, properties={
    "name": "Bob Doe",
    "affiliation": "University of Flatland"
}))

crate.write("exp_crate")

It produces this error:

Traceback (most recent call last):
  File "/Users/rosannasmith/Documents/LDaCA/Repos/oni-downloader/test.py", line 60, in <module>
    crate.write("exp_crate")
  File "/Users/rosannasmith/Documents/LDaCA/Repos/oni-downloader/venv/lib/python3.12/site-packages/rocrate/rocrate.py", line 452, in write
    writable_entity.write(base_path)
  File "/Users/rosannasmith/Documents/LDaCA/Repos/oni-downloader/venv/lib/python3.12/site-packages/rocrate/model/dataset.py", line 57, in write
    raise FileNotFoundError(
FileNotFoundError: [Errno 2] No such file or directory: 'arcp://name,farms-to-freeways-example-dataset'
simleo commented 6 months ago

I could not reproduce the error. I ran the above code, after adding the missing from rocrate.rocrate import ROCrate, and it ran with no errors. No directory was created in exp_crate and no @id was changed to ./.

This is with ro-crate-py from the current master branch. I was able to reproduce the error with ro-crate-py 0.9.0, so this problem must have been fixed as a side effect of something that got merged after 0.9.0.

jmfernandez commented 6 months ago

I guess the issue might be related to some kind of default of urllib.parse library which depends on the Linux distribution or the Python installer, because I have been able to reproduce the issued found by @rosanna-smith with Python versions from 3.7 to 3.11 (btw, I'm using Gentoo Linux). I could not reproduce the issue with Python 3.12 because a different issue related to pkg_resources arose.

I experienced something similar in an unrelated development when I was testing several interactions between JSON-LD processing libraries, relative URI resolution and the scheme used for the permanent identifiers.

jmfernandez commented 6 months ago

All the tests were done in freshly created Python venvs, first updating pip and wheel, then installing rocrate package, and last testing the script (with the fix about adding from rocrate.rocrate import ROCrate near its beginning).

stain commented 6 months ago

We should be supporting ARCP URIs as in https://www.researchobject.org/ro-crate/1.1/appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file and currently claim Python 3.7 is supported.

@elichad will investigate

elichad commented 6 months ago

This seems to be the same issue as #167, just with a slightly different manifestation. That issue was fixed in PR #168 but the fix hasn't been released yet - @stain @simleo is there anything blocking us from making a release?

simleo commented 6 months ago

I think we can make a release after merging #173.

simleo commented 5 months ago

@rosanna-smith can you check that the problem is solved in ro-crate 0.10.0?

rosanna-smith commented 5 months ago

Thanks! Can confirm this solved the problem on my end as well.