GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
36 stars 21 forks source link

MIXS:0000056 and MIXS:0000058 need expected values swapped #396

Closed only1chunts closed 5 months ago

only1chunts commented 2 years ago

The Definition, Expected Value, Value Syntax and Example fields all appear to have been swapped between "Assembly Software" and "Assembly Quality" terms, i.e the expected value for Assembly software should be what is in Assembly quality and vice-versa

Current term details

Term name - assembly software
Term ID - MIXS:0000056
Structured comment name - assembly_software
Definition - The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated
Expected value - enumeration
Value syntax - [Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)]
Example - High-quality draft genome
Package(s) - agriculture
Term name - assembly quality
Term ID - MIXS:0000058
Structured comment name - assembly_quality  
Definition - Tool(s) used for assembly, including version number and parameters 
Expected value - name and version of software, parameters used  
Value- syntax - {software};{version};{parameters}
Example - metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise
Package(s) - agriculture

Suggested update(s)

Term ID - MIXS:0000056
Definition - Tool(s) used for assembly, including version number and parameters
Expected value - name and version of software, parameters used  
Value- syntax - {software};{version};{parameters}
Example - metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise
Term ID - MIXS:0000058
Definition - The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated
Expected value - enumeration
Value syntax - [Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)]
Example - High-quality draft genome

Additional context

turbomam commented 2 years ago

Do you intend to change it on the Google Sheet or in the LinkML model ?

Here's the existing LinkML modeling. Note that we should be moving the content of the string_serialization fields to structured_pattern fields (#388)

  assembly_qual:
    is_a: sequencing field
    title: assembly quality
    description: "The assembly quality category...or for which no genome size could be estimated"
    range: assembly_qual_enum
    multivalued: false
    examples:
    - value: High-quality draft genome
    comments: []
    aliases:
    - assembly quality
    annotations:
      expected_value: enumeration
    slot_uri: MIXS:0000056

where

  assembly_qual_enum:
    permissible_values:
      Finished genome: {}
      High-quality draft genome: {}
      Medium-quality draft genome: {}
      Low-quality draft genome: {}
      Genome fragment(s): {}
  # MAM 2022-03-22 low;{percentage}: {}

and

  assembly_software:
    is_a: sequencing field
    title: assembly software
    description: Tool(s) used for assembly, including version number and parameters
    range: string
    multivalued: false
    examples:
    - value: metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise
    comments: []
    aliases:
    - assembly software
    annotations:
      expected_value: name and version of software, parameters used
    string_serialization: '{software};{version};{parameters}'
    slot_uri: MIXS:0000058

I can do it as a PR to the LinkML model

only1chunts commented 2 years ago

I would like to do this a group exercise if possible so that I can learn how to do these sorts of updates correctly. Can we schedule it to work on at the next tech-working group call on May 10th?

Note- I updated the original comment as I realised the definition that was also swapped!

only1chunts commented 1 year ago

Whilst this is an important update to be made it should not affect the SoT transformation process so is not vital to fix before that is completed.

Proposal

Update; Definition, Expected value, Value syntax and Example fields of both MIXSID:0000056 and MIXSID:0000058 as shown in original comment above.

only1chunts commented 5 months ago

These changes appear to be correct in the current main branch, so closing this ticket as complete