GLAM-Workbench / glam-workbench.github.io

https://glam-workbench.github.io/
25 stars 6 forks source link

Add RO-Crate metadata to notebooks and repositories #65

Closed wragge closed 10 months ago

wragge commented 1 year ago

As part of the HASS Community Data Lab, I'm adding RO-Crate metadata to notebooks and repositories to enable information to be harvested and used to populate a tools registry.

The basic idea is that RO-Crate metadata would be saved in a notebook's metadata, then when notebooks are pushed to a repository an action would run to extract the metadata from the notebooks and save into an RO-Crate JSON file. Some work on this has already been done as part of the GLAM Workbench repository template (thanks to ATAP).

Because I'm going to be working with existing repositories, rather than starting new ones, I'm going to have to add the necessary scripts to each repository, and find a (not too painful) way of adding the metadata to existing notebooks.

What metadata?

Metadata describing a repository, based on schema.org and RO-Crate spec:

Metadata describing a notebook, based on schema.org and RO-Crate spec:

I think about and keywords will take the most thought as they will be important as an access point in the context of a tool registry. Need to use/develop a controlled list?

Writing metadata to notebooks

Notebooks are just JSON, so I could just read, edit, and write them as JSON files, but might be safer to use nbformat to ensure that everything conforms with the notebook file format.

I'll think I'll probably create a script to add basic metadata to notebooks, then I'll manually edit as required. Fields I could automatically populate:

Finish off RO-Crate action

See the pull release from ATAP on the repo template -- adjust and finish off.

Later on...

Once this is done, I should change the way the documentation pages are generated to pull as much as possible from the RO-Crate metadata, so I'm not managing the same info in different places.

wragge commented 1 year ago

Make use of Tadirah for tags/subjects: https://tadirah.info/

wragge commented 1 year ago

Things to do on the RO-Crate maker GitHub action:

wragge commented 1 year ago

https://www.researchobject.org/ro-crate/1.1/appendix/jsonld.html

Multiple values and references can be represented using JSON arrays, as exemplified in hasPart above; however as the RO-Crate JSON-LD is in compacted form, any single-element arrays like "author": [{"@id": "#alice"}] SHOULD be unpacked to a single value like "author": {"@id": "#alice"}.

wragge commented 1 year ago

ok, after much to and fro I think I'm going to take the following approach (slightly different from the ATAP pull request).

So update_crate.py would be run locally before any changes get pushed, not in a GitHub action on push. This enables me to make manual changes to the crate. Also update_crate.py will update, rather than replace, existing entities. This means I can automatically populate the crate with details from nbs, then enrich as necessary without losing any of these manual changes. I think this best suits my workflow.

wragge commented 10 months ago

Added to GW Repository Template: https://github.com/GLAM-Workbench/glam-workbench-template