ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
46 stars 23 forks source link

Add methods for adding and updating JSON-LD directly (partials for WMS) #149

Closed kinow closed 1 year ago

kinow commented 1 year ago

Closes #146

@simleo I was toying with the code you suggest a little in the Autosubmit code base. But then I thought that if I wrote those functions in Autosubmit, other WMS would still require to do the same.

It occurred to me, then, maybe add these new methods to ro-crate-py instead? I made this draft pull request after testing it with Autosubmit. You can see the current implementation here: https://earth.bsc.es/gitlab/es/autosubmit/-/merge_requests/317/diffs

Most of the code in autosubmit.py in that merge request is retrieving metadata about the Autosubmit workflow configuration and execution logs and data. But what previously was

        yaml_file = Path(experiment_path, 'conf/rocrate.yml')
        try:
            with open(yaml_file) as f:
                try:
                    yaml_content = YAMLParserFactory().create_parser().load(f)
                except Exception as e:
...
...
# Create RO-Crate and add Autosubmit data
...
...
        # Fill-in the pre-populated data
        crate.license = yaml_content['license']
        # These two sets keep track of the authors and orgs to add to the CRATE
        authors = set()
        organizations = set()
        for author in yaml_content['authors']:
            authors.add(author['orcid'])
            organizations.add(author['ror'])
            crate.add(Person(crate, author['orcid'], {
                'name': author['name'],
                'contactPoint': { '@id': f'mailto: {author["email"]}' },
                'affiliation': { '@id': f'mailto: {author["ror"]}' }
            }))
            crate.add(ContextEntity(crate, f'mailto: {author["email"]}', {
                '@type': 'ContactPoint',
                'contactType': 'Author',
                'email': author['email'],
                'identifier': author['email'],
                'url': author['orcid'],
            }))
            crate.add(ContextEntity(crate, author['ror'], {
                '@type': 'Organization',
                'name': author['organisation_name']
            }))
        crate.creator = { '@id': id for id in authors }
        crate.publisher = { '@id': id for id in organizations }

Now became, basically

        json_file = Path(experiment_path, 'conf/rocrate.json')
        try:
            with open(json_file, 'r') as f:
                try:
                    json_content = json.load(f)
                except Exception as e:
                    raise AutosubmitCritical(f'Failed to parse $expid/conf/rocrate.json pre-populated file {json_file}', 7011, e)
        except IOError:
...
...
# Create RO-Crate and add Autosubmit data
# because ro-crate-py's add replaces data.
...
...
            if '@graph' in json_content:
                for jsonld_node in json_content['@graph']:
                    crate.add_or_update_jsonld(jsonld_node)

            # Write RO-Crate ZIP.
            crate.write_zip(Path(experiment_path, "rocrate.zip"))

I liked the general structure, and I think it could be applied to other WMS that preferred to go JSON-LD → ro-crate-py → JSON-LD, instead of what I had in the merge request before, YAML → Python → ro-crate-py → JSON-LD (where YAML → Python could possibly vary between WMS's).

If it sounds like a good idea I will add docs & tests :+1: Thank you for the code example.

Also added another commit with comments & f-strings for code that I was having a look, and trying to understand before using it. Let me know if you prefer that I drop that commit and open a separate pull request for it.

Cheers Bruno

simleo commented 1 year ago

I've added a commit with some changes:

To answer the question in the TODO comment: when updating, any keys starting with @ need to be removed (after popping the @id). You don't want @type to be there, to avoid messing up things. When adding, instead, @type should be there: if not, a Thing will be created (that might be what the user wants though).

Please do add unit tests for the new methods. Better add a new test module for these.

kinow commented 1 year ago

Thanks for the review @simleo ! Will take a look at your commit and update with tests.

simleo commented 1 year ago

I've changed the docs to make them more general, since the new methods are also general (as appropriate for a generic RO-Crate library) and useful beyond RO-Crate generation by a WMS. I've also tweaked the code a bit, details are in the commits. Thanks for the contribution!

kinow commented 1 year ago

Oh, and the test failure happened on macos... in other projects I had to kick macos jobs as they were less reliable then linux on github-actions. Maybe a kick will fix it, otherwise let me know and I can take a look and try to make the build pass.

simleo commented 1 year ago

Oh, and the test failure happened on macos...

I think there's something weird happening with the macOS CI instances. I've added a skipif clause to run that test only on posix.