GenomicsStandardsConsortium / mixs-rdf

Creative Commons Zero v1.0 Universal
3 stars 0 forks source link

Generate MIxS RDF file using NMDC code #28

Closed ramonawalls closed 3 years ago

ramonawalls commented 3 years ago

Use NMDC YAML code (i.e., BioLinkML) to convert MIxS 6 to RDF; numeric IRIs will need to be assigned using the ‘Unique MIXS ID’ column. (i.e., generate a python file for mixs terms) from: MIxS v6.0 terms, with updated IDs https://docs.google.com/spreadsheets/d/1QDeeUcDqXes69Y2RjU2aWgOpCVWo5OVsBX9MKmMqi_o/edit#gid=345753674

wdduncan commented 3 years ago

Using biolinkml and the yaml specification (here) we can generate rdf quickly/efficiently without the need for a separate script like a previously wrote (here). This makes for what I think is an easier to maintain workflow:

  1. mixs spreadsheet -> mixs.yaml
  2. mixs.yam -- biolinkml --> mixs.owl The command to convert to owl in biolinkml is a one line.
    Other advantages of adopting the biolinkml workflow is that we can also produce json and json schema versions.

Decisions still to be made about what types of OWL entities the mixs terms need to be. That is, will the terms be represented as:

A consideration in this decision is (I think) how were are intending to use the mixs-rdf. From what I can tell we mainly need the RDF representation for:

  1. managing URIs for terms, packages, and checklists
  2. relating terms to packages and checklists

The mixs standard is still going to be distributed as a spreadsheet, which we will produce using a query.
From this perspective, the main determiner seems to be which representation the maintainers will be most comfortable working in. A caveat to consider is whether we wish to distribute the mixs-rdf as a specification for translating data. For example, in the NMDC metadata translation project we use the mixs terms as attributes, for which we then produce json.

cc @cmungall

ramonawalls commented 3 years ago

The answer to the above question is that we will loosely treat mixs terms as object properties (see #9).

Biolink ML code also generates documentation. Each term will be a page on our website that the URL can resolve to. See https://microbiomedata.github.io/nmdc-metadata/docs/tot_depth_water_col.html

ramonawalls commented 3 years ago

LinkML is now well supported, but if it should go away, we can do a one time conversion to JSON (-LD) to maintain RDF. It is in YAML, that is not going to go away. It is an open format.

Not yet any publications we can site.

ramonawalls commented 3 years ago

The decision to use linkml has been finalized, so I am closing this issue.