OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
161 stars 201 forks source link

Request for MIXS prefix and PURLs #822

Closed ramonawalls closed 4 years ago

ramonawalls commented 5 years ago

The Compliance and Interoperability Group (CIG) of the Genomics Standards Consortium would like to request an OBO Foundry namespace for metadata terms that are part of the Minimum Information for any (x) Sequence standard. We would like to use the prefix “MIXS”. Ramona Walls (rlwalls2008@gmail.com) will be the contact person, and our issue tracker is at https://github.com/GenomicsStandardsConsortium/mixs/issues.

MIxS terms are widely used to describe BioSamples and sequence data and are required by the INSDC databases. Despite wide adoption, the technical implementation of MIxS is out of date, and the CIG is working to modernize it. Permanent, redirectable, resolvable identifiers for terms are an important part of that effort.

Currently, most applications (including NCBI and EBI) simply use the term labels, either as the full label or as a linux-friendly short name. Biodiversity Information Standards (TDWG) is graciously providing hosting for resolvable URL names for MIxS (https://terms.tdwg.org/wiki/Class:MIxS), but these URLs are not redirectable. As we update our backend system, we would like to move to PURL-type identifiers whose corresponding landing pages we can control from within our own github organization.

We realize that MIxS terms do not make up an ontology, and therefore are outside the normal scope of the OBO Foundry. However, we think OBO PURLs would be a great home because 1) the infrastructure nicely suits our needs, 2) there is overlap between the OBO Foundry Operations Committee and the CIG (Lynn Schriml and Ramona Walls are members of both), which facilitates cross governance, 3) several OBO Foundry ontologies would like to reuse terms from MIxS, which will be much easier if they are in the OBO library, 4) repositories trying to implement the MIxS checklists (e.g., GigaDB, CyVerse) will find a stable list with auditable update tracking an invaluable tool.

We are open to suggestions about how to best organize the terms for inclusion within the OBO Library. My inclination is to create an OWL file that includes all of the MIxS terms as data properties. Then anyone who wants to use the MIxS properties in their ontology could just import them. Some of the properties, however, specify an ontology term as their target (e.g., https://terms.tdwg.org/wiki/mixs:env_feature), and those might be handled somewhat differently. It is a matter of sorting out how we want to handle MIxS as RDF, and that discussion is ongoing.

cmungall commented 5 years ago

The use case and justification makes sense. Given we already house many application ontologies I see no reason not to include this.

Are you sure about data properties? So even if someone enters an ENVO class you would model as a string value with the ENVO label or ENVO CURIE? (UPDATE from https://github.com/GenomicsStandardsConsortium/mixs/issues/18 it seems like the string value may be a concatenation like <CURIE> [<LABEL>]. I think there is value in having the ability to make a 'normalized mixs' JSON-LD/RDF file where this is represented directly as a URI)

Do you see this as just housing properties, or also values? How do you intend to represent value sets (i.e. constraining properties to coming from a branch of an ontology, or from a defined set of values). Cc @mbrush who has thought a lot about this for clinical CDEs.

Although I generally don't like the strategy of distancing oneself from the science being modeled, I think in this case there may be value in modeling these things as information artefacts.

zhengj2007 commented 5 years ago

The microbiome datasets I worked on are collected using MIXS standard. For harmonization and integration the datasets from different resources, we mapped the terms to OBO Foundry ontologies. During this process, I converted the MIXS standards including 14 environment packages in OWL format which is available on webProtege: https://webprotege.stanford.edu/#projects/616e2fea-3370-46a2-8bfc-d11c0b169a92/edit/Classes

Since the MIxS standard is just a list of terms without any structure, we reorganized the terms following ontological framework and used as search filters. Please see: https://webprotege.stanford.edu/#projects/f15a2d5f-f8e7-4262-ae47-b02c291e9753/edit/Classes. The OWL file includes MIXS standards with other sample details we need.

In my view, MIxS standard is not an ontology. I don't think it is suitable to register as an OBOFoundry ontology.

Chris, Richard, Lynn and I involved in NIAID Human Pathogen & Vector Sequencing Metadata Standards work (https://www.niaid.nih.gov/research/human-pathogen-and-vector-sequencing-metadata-standards). The terms in the GSCID/BRC Project and Sample Application Standard – Core Project, Core Sample, and Sequencing Assay were mapped to OBO Foundry ontologies and added in OBI if no mapped ontology terms found. Finally all terms in the standards are available in OBI and the OBI NIAID GSC BRC view OWL file is generated from OBI, which is available on: http://www.ontobee.org/ontology/OBI-NIAID-GSC-BRC-view (Reference: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099979)

It may the approach that MIxS can consider.

ramonawalls commented 5 years ago

Thanks for the feedback, @zhengj2007 . I will look into all the links you provided.

@cmungall, what I was trying to say in the last paragraph of my request is that simple annotation properties are not appropriate for those that require an ontology term as the value, so we are in agreement there.

Overall, GSC is not prepared to fully model MIxS as an ontology due to lack of resources, and I hesitate to add more semantics than would normally be needed by users of MIxS. My initial plan was to follow practices similar to what Darwin Core did when they converted their terms to RDF (https://dwc.tdwg.org/rdf/). Nonetheless, I certainly appreciate the suggestions to model MIxS more fully and will investigate this.

@mcourtot, I think you also did some work with MIxS. Do you have any suggestions or input?

ramonawalls commented 4 years ago

We have decided to use w3-ids for the GSC, so I will close this request. Thanks!