Closed wragge closed 10 months ago
Make use of Tadirah for tags/subjects: https://tadirah.info/
Things to do on the RO-Crate maker GitHub action:
url
) and GW section (documenation
)rocrate
https://www.researchobject.org/ro-crate/1.1/appendix/jsonld.html
Multiple values and references can be represented using JSON arrays, as exemplified in hasPart above; however as the RO-Crate JSON-LD is in compacted form, any single-element arrays like "author": [{"@id": "#alice"}] SHOULD be unpacked to a single value like "author": {"@id": "#alice"}.
ok, after much to and fro I think I'm going to take the following approach (slightly different from the ATAP pull request).
on CookieCutter initialisation, create a basic rocrate file in the new repository with the project name, description, and creator details from the cookiecutter config file
have an update_crate.py file in the scripts
directory of the repository which will gather info from notebooks and add to the crate
So update_crate.py
would be run locally before any changes get pushed, not in a GitHub action on push. This enables me to make manual changes to the crate. Also update_crate.py
will update, rather than replace, existing entities. This means I can automatically populate the crate with details from nbs, then enrich as necessary without losing any of these manual changes. I think this best suits my workflow.
Added to GW Repository Template: https://github.com/GLAM-Workbench/glam-workbench-template
As part of the HASS Community Data Lab, I'm adding RO-Crate metadata to notebooks and repositories to enable information to be harvested and used to populate a tools registry.
The basic idea is that RO-Crate metadata would be saved in a notebook's metadata, then when notebooks are pushed to a repository an action would run to extract the metadata from the notebooks and save into an RO-Crate JSON file. Some work on this has already been done as part of the GLAM Workbench repository template (thanks to ATAP).
Because I'm going to be working with existing repositories, rather than starting new ones, I'm going to have to add the necessary scripts to each repository, and find a (not too painful) way of adding the metadata to existing notebooks.
What metadata?
Metadata describing a repository, based on schema.org and RO-Crate spec:
@id
identifier
- Zenodo DOI@type
(is this aRepositoryCollection
?)name
description
documention
-- link to GW sectionversion
license
url
-- GitHubMetadata describing a notebook, based on schema.org and RO-Crate spec:
@type: ["File", "SoftwareSourceCode"]
(or SoftwareApplication? or SoftwareWorkflow?)name
-- title of notebookcreators
description
programmingLanguage
runtimePlatform
softwareRequirements
-- list ids of packages imported/usedcodeRepository
-- link to GitHubdocumention
-- link to GW pageencodingFormat: "application/x-ipynb+json"
input
-- source of dataconformsTo
- ?about
-- subjects?keywords
-- align with tags in GW?license
I think
about
andkeywords
will take the most thought as they will be important as an access point in the context of a tool registry. Need to use/develop a controlled list?Writing metadata to notebooks
Notebooks are just JSON, so I could just read, edit, and write them as JSON files, but might be safer to use nbformat to ensure that everything conforms with the notebook file format.
I'll think I'll probably create a script to add basic metadata to notebooks, then I'll manually edit as required. Fields I could automatically populate:
@type
name
-- from the title of the notebookcreators
-- start with medescription
-- extract first para after title?programmingLanguage
-- all PythonruntimePlatform
-- can I get this from pyenv?softwareRequirements
-- get a list of Python imports, then need to map these to ids?codeRepository
-- get from gitdocumention
-- in most cases the path will be the same as the file titleencodingFormat: "application/x-ipynb+json"
input
-- mostly Trove API?conformsTo
- ?license
- all MITFinish off RO-Crate action
See the pull release from ATAP on the repo template -- adjust and finish off.
Later on...
Once this is done, I should change the way the documentation pages are generated to pull as much as possible from the RO-Crate metadata, so I'm not managing the same info in different places.