GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
38 stars 21 forks source link

Merge MIXS:0000253 and MIXS:0000254 #73

Closed only1chunts closed 3 years ago

only1chunts commented 4 years ago

Current term details

Term name - host infra-specific name
Term ID - MIXS:0000253 
Structured comment name - host_infra_specific_name 
Definition - Taxonomic information about the host below subspecies level name
Expected value - 
Value syntax -{text}
Example - borealis
Preferred unit - 
Package(s) - host-associated AND plant-assocaited

NB - partner term MIXS:0000254 "host_infra_specific_rank"

@lschriml says :

seems to be equivalent to core: subspecific genetic lineage

Term name - subspecific genetic lineage
Term ID - MIXS:0000020
Structured comment name - subspecf_gen_lin
Definition - This should provide further information about the genetic distinctness of the sequenced organism by recording additional information e.g. serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. It can also contain alternative taxonomic information. It should contain both the lineage name, and the lineage rank, i.e. biovar:abc123
Expected value - genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype
Value syntax - {rank name}:{text}
Example - serovar:Newport
Preferred unit - 
Package(s) - CORE

Suggested updates

The difference between subspecf_gen_lin and host_infra_specific_name is online that the later is specific to the host of the sample, perhaps the definitions need to be updated to make that clear? The CORE term subspecf_gen_lin includes the rank as part of the value, the host_infra_specific_name requires the additional term host_infra_specific_rank to be included, this should be merged to be consistent by including rank and name in 1 field.

only1chunts commented 3 years ago

@anjijohnston you're currently not appearing in the list of options to assign to! I guess we need to get your username into the repo somehow, hopefully tagging you here will do that!

ramonawalls commented 3 years ago

We will merge 253 and 254 into a single term that includes rank and name, to be consistent with the core term for subspecific genetic lineage (MIXS:0000020).

Another problem with 253 and 254 is that they use "infra-specific", where as 020 uses subspecific. There terms for subspecific name, but is included in NCBI taxonomy.

Structure comment name should be host subspecf_gen_lin but that is one too many characters. Suggest using host_subspecf_genlin.

New term:

Term name - host subspecific genetic lineage
Term ID - MIXS:0001126 (need to mint a new ID, since the others are being deprecated)
Structured comment name - host_subspecf_genlin 
Definition - Information about the genetic distinctness of the host organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123
Expected value - Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, plus genetic lineage name.
Value syntax - {rank name}:{text}
Example - subvariety:glabrum
Preferred unit - 
Package(s) - host-associated AND plant-associated

I also suggest fixing the definition of 020 to word it like a definition and get rid of the sentence "It can also contain alternative taxonomic information," because it is just awful to have two meanings for one term.

Term name - subspecific genetic lineage
Term ID - MIXS:0000020
Structured comment name - subspecf_gen_lin
Definition - Information about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123
Expected value - genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, plus genetic lineage name.
Value syntax - {rank name}:{text}
Example - serovar:Newport
Preferred unit - 
Package(s) - CORE

For both of these, cardinality should be 0-n, instead of 1, since there may be more than one notation or classification.

ramonawalls commented 3 years ago

I updated the spreadsheet and moved to Review in Progress.

ramonawalls commented 3 years ago

@lschriml I did not realize you had a spreadsheet for tracking term IDs. I looked through the editing spreadsheet and tried to choose a number that was free (MIXS:0001126), but I might be wrong. Could you please add this new term to your master sheet and update the ID in our editing sheet if needed?

ramonawalls commented 3 years ago

@lschriml I assigned to you for review. Please close when done. Thanks!

lschriml commented 3 years ago

Hello @ramonawalls -- I updated that ID, it is now: MIXS:0001318.

    [MIxS-Ag](https://docs.google.com/spreadsheets/d/1cR-C07PBk8Dufil8LqzR_Xg_OKzBuZOSw8AZpqEFYHc/edit#gid=0) was using the two deprecated terms, I updated it. 

The MIxS ID list is at: https://docs.google.com/spreadsheets/d/1p5Ziciznkk9er99aPhc7ed8NDE0oUKw0rrBlEOYUtiA/edit#gid=724395808

Tabs: unique term IDs (in alphabetical order) MIxS6 new IDs --> this is the list of IDs I use to create a new ID, at the bottom of the list, add a new term and it's ID --> once a new term + it's new IDs is selected from this spreadsheet, the term is added to the the 'unique term IDs' tab, and sorted core, packages IDS (MIxS terms across MIxS Core and MIxS packages) Checklist_Package IDS MIxS ID space

Cheers, Lynn

ramonawalls commented 3 years ago

Thank you @lschriml ! Closing this and moving to Done.