GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
36 stars 20 forks source link

Making GSC repo FAIRer #534

Open only1chunts opened 1 year ago

only1chunts commented 1 year ago

As a good open source community we want to be aiming to FAIR. In the evolving world of FAIR-Software the importance of metadata about the code is becoming a focus point. There is a movement being led by the software heritage archive called CodeMeta: https://codemeta.github.io/

Essentially its a JSON-LD format file that could be included with the code (in our case in GitHub) to describe the code in machine readable metadata. They have even created a simple "generator" tool to help people create the json-ld file: https://codemeta.github.io/codemeta-generator/

I started creating it using the form, and below is what I ended up with, but it definately needs more of the authors and contributors adding in, as well as more of the "run-time-environment" details added (languages etc):

{ "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "@type": "SoftwareSourceCode", "license": "https://spdx.org/licenses/CC0-1.0", "codeRepository": "https://github.com/GenomicsStandardsConsortium/mixs/", "dateModified": "2023-04-01", "downloadUrl": "https://github.com/GenomicsStandardsConsortium/mixs/releases/tag/mixs6.1.0", "name": "GSC-MIXS", "version": "6.1", "identifier": "https://github.com/GenomicsStandardsConsortium/mixs/releases/tag/mixs6.1.0", "description": "The Genomics Standards Consortium maintain the Minimum Information about any(x) sequence (MIxS) checklists. This code includes the source of truth of the current checklists as well as the tools to provide those checklists in multiple formats. It is envisaged that in the future, tools to validate checklists will also be included.", "applicationCategory": "checklist", "isPartOf": "https://gensc.org", "keywords": [ "genomics", "checklists", "models", "ontologies", "data-sharing" ], "programmingLanguage": [ "link-ml" ], "author": [ { "@type": "Person", "@id": "https://orcid.org/0000-0001-8815-0078", "givenName": "Ramona", "familyName": "Walls", "email": "rlwalls2008@gmail.com" } ], "contributor": [ { "@type": "Person", "@id": "https://orcid.org/0000-0002-1335-0881", "givenName": "Christopher", "familyName": "Hunter", "email": "only1chunts@hotmail.com" } ] }

turbomam commented 1 year ago

LinkML generates jsonschema-ld

only1chunts commented 1 year ago

true, but I assume it doesn't produce CodeMeta JSON-LD.

turbomam commented 1 year ago

good point

maybe a conversion from pyproject.toml?

pbuttigieg commented 1 year ago

Some of the approaches we use for ODIS may be useful here. Here's a mix of those and some thoughts from my side.

If we do something like this, then the various indexing services and their bots will be able to discover GSC metadata more effectively, making it FAIR at scale. We can also harvest this into IOC-UNESCO ODIS to dovetail with the data feeds from/for the UN Ocean Decade Programme, the Ocean Biomolecular Observing Network (https://github.com/iodepo/odis-arch/issues/146).

This will also dovetail with the GSC MIOP project via BeBOP, which has a sitemap-based ODIS interface and shares metadata about omic protocols in ODIS/JSON-LD+schema.org compatible ways (tested during an EU Horizon project TechOceanS here, with ODIS metadata that the sitemap points to here)

I suppose the JSON-LD would live on the GSC's website somewhere, perhaps embedded in pages or just in a file store.

only1chunts commented 3 weeks ago

further to this, there is now a CodeFair tool https://codefair.io/ that can be integrated with a GitHub repo to assist in making it FAIR compliant. (I've not tried it, I only just found out about it!)