GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
38 stars 21 forks source link

Consensus on ontology term format #91

Closed raissameyer closed 9 months ago

raissameyer commented 3 years ago

Describe the bug MIxS_v5 gives unclear guidance on ontology term formatting in the "Examples" column. The two formatting suggestions follow these structures [EFO:EFO_0001779] and [ENVO:00001998]. Such inconsistency in guidance, and thus in use, impedes automated analyses. Please decide which format should be followed and update the guidance accordingly.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'https://press3.mcs.anl.gov/gensc/mixs/'
  2. Download MIxS Checklist v5.0: mixs_v5
  3. Open the downloaded file 'mixs_v5' and go to tab 'MIxS'
  4. Scroll down to row 5 'experimental_factor', scroll right to column F 'Example'
  5. Read the content of the field and note that it says "time series design [EFO:EFO_0001779]"
  6. Scroll down to row 12 'env_brad_scale', scroll right to column F 'Example'
  7. Read the content of the field and note that it says "forest biome [ENVO:01000174]"

Expected behavior The suggested format would be expected to be consistent.

Screenshots

image

Additional context If you intend to allow users to also use terms that don't originate from the ontologies you propose but are imported into those, I'd suggest going with Example 1 [EFO:EFO_0001779]. Otherwise the restriction to not use imported terms should be communicated.

lschriml commented 3 years ago

Hello Raissa, Thank you for pointing out this inconsistency. We are currently preparing MIxS v6.0 and will address this issue.

Cheers, Lynn

turbomam commented 1 year ago

Excellent issue @raissameyer. This is something I really care about, so I'm going to dump my thoughts. Let me know if I'm just dancing around your concern.

As of v6.2.0, the MIxS schema doesn't have any mechanism for validating the semantics of ontology term values in data. We do have a regular expression that syntactically looks for a label part followed by a term id part, enclosed in square brackets.

The schema isn't trying to define all legal external prefixes either.

For some of the MIxS terms, we could create enumerated lists of allowed ontology terms in the desired notation, but that would require a lot of maintenance, and somebody would inevitably feel that their preferred term was left off of the enumeration.

LinkML does have a mechanism by which you can specify one or more legal root terms form one or more ontologies as legal values for a term/slot. Or even do algebra, like removing an unwanted path in the ontology. Before making a MIxS release, we would run the schema through the Ontology Access Kit's poorly documented vskit command to retrieve and freeze the legal foreign terms for any MIxS term.

But there's currently one problem: the legal values would just be the ontology id, "ENVO:01000174", not the ontology label and id, "forest biome [ENVO:01000174]"

@cmungall and I have talked about adding support for templating the legal foreign terms like that, but I don;'t think any action has been taken yet.

ramonawalls commented 9 months ago

This has been standardized through the use of LinkML. If additional work is needed, there should be a new issue.