Closed ramonawalls closed 3 years ago
Using biolinkml and the yaml specification (here) we can generate rdf quickly/efficiently without the need for a separate script like a previously wrote (here). This makes for what I think is an easier to maintain workflow:
Decisions still to be made about what types of OWL entities the mixs terms need to be. That is, will the terms be represented as:
A consideration in this decision is (I think) how were are intending to use the mixs-rdf. From what I can tell we mainly need the RDF representation for:
The mixs standard is still going to be distributed as a spreadsheet, which we will produce using a query.
From this perspective, the main determiner seems to be which representation the maintainers will be most comfortable working in. A caveat to consider is whether we wish to distribute the mixs-rdf as a specification for translating data. For example, in the NMDC metadata translation project we use the mixs terms as attributes, for which we then produce json.
cc @cmungall
The answer to the above question is that we will loosely treat mixs terms as object properties (see #9).
Biolink ML code also generates documentation. Each term will be a page on our website that the URL can resolve to. See https://microbiomedata.github.io/nmdc-metadata/docs/tot_depth_water_col.html
LinkML is now well supported, but if it should go away, we can do a one time conversion to JSON (-LD) to maintain RDF. It is in YAML, that is not going to go away. It is an open format.
Not yet any publications we can site.
The decision to use linkml has been finalized, so I am closing this issue.
Use NMDC YAML code (i.e., BioLinkML) to convert MIxS 6 to RDF; numeric IRIs will need to be assigned using the ‘Unique MIXS ID’ column. (i.e., generate a python file for mixs terms) from: MIxS v6.0 terms, with updated IDs https://docs.google.com/spreadsheets/d/1QDeeUcDqXes69Y2RjU2aWgOpCVWo5OVsBX9MKmMqi_o/edit#gid=345753674