GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
36 stars 21 forks source link

Update definition of assembly quality [MIXS:0000056] #147

Open only1chunts opened 3 years ago

only1chunts commented 3 years ago

Current term details

Term name -  assembly quality
Term ID - [MIXS:0000056]
Structured comment name - assembly_qual
Definition - The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated
Expected value - enumeration
Value syntax - [Finished genome\|High-quality draft genome\|Medium-quality draft genome\|Low-quality draft genome\|Genome fragment(s)]
Example - High-quality draft genome

Suggested update(s) Comments from v6 review discussions:

update description to include the previous item name. Add previous term as a individual item and mark as obsolete. keep in the core ; add a field for previous names/obsolete terms can we re-write defintion to cover all possible assembly use cases, MISAG, MIMAG, MIUVIG fields; in MIxSv4, this term was 'finishing strategy'

Additional context No changes made in v6 review, but discussions indicated that we may need to find a way to include old names of things in the term details somewhere, flagged for further discussion in v7 review.

only1chunts commented 1 year ago

The definition is highly Metagenomic centric, we should look to make it more generic to allow for use with genome sequences as well (e.g. plant and animal genomes). We should also look at the overlap with MIXS:000069 "Completeness score" to ensure that both are distinct, at present, the completeness score definition also includes High, medium and low quality as valid suggested values.

lschriml commented 1 year ago

We could ask Emiley to weigh in on this term, as she and her group developed this set of terms.

On Fri, Nov 4, 2022 at 10:56 AM Chris Hunter @.***> wrote:

The definition is highly Metagenomic centric, we should look to make it more generic to allow for use with genome sequences as well (e.g. plant and animal genomes). We should also look at the overlap with MIXS:000069 "Completeness score" to ensure that both are distinct, at present, the completeness score definition also includes High, medium and low quality as valid suggested values.

— Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/147#issuecomment-1303684781, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBB4DJE4WCUUAX7HOS67STWGUPYFANCNFSM45JLJY3Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Lynn M. Schriml, Ph.D. Associate Professor

Institute for Genome Sciences University of Maryland School of Medicine Department of Epidemiology and Public Health 670 W. Baltimore St., HSFIII, Room 3061 Baltimore, MD 21201 P: 410-706-6776 | F: 410-706-6756 @.***