GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
33 stars 20 forks source link

Widen sop [MIXS:0000090] definition #745

Open LynnDelgat opened 6 months ago

LynnDelgat commented 6 months ago

Current term details Please supply the current details of the term that you would like to update:

Term name - sop
Term ID - [if known] MIXS:0000090
Structured comment name - 
Definition - Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences
Expected value - reference to SOP
Value syntax -
Example -
Preferred unit - 
Package(s) - 
Relationship to other MIXS terms [if applicable] -

Suggested update(s) Please supply the new suggestions for any of the details listed below (only insert text to those details that should be updated):

Term name - 
Structured comment name - 
Definition - Standard operating procedures used in assembly, bioinformatic processing and/or annotation of genomes, metagenomes or (environmental) sequences
Expected value - 
Value syntax -
Example -
Preferred unit - 
Package(s) - 
Relationship to other MIXS terms [if applicable] -

Additional context The current definition is too narrow for (meta)barcoding purposes, only including assembly or annotation protocols, thus excluding for example the processing from raw reads to processed reads for (meta)barcoding (which includes steps such as demultiplexing, filtering, clustering, denoising, etc.). I've suggested to add "bioinformatic processing" to the definition to cover this, but if anyone knows a better way to formulate this, please chip in.

In addition, "genomes, metagenomes or environmental sequences" might exclude some cases (e.g. barcoding of non-environmental samples), so I suggest to put "environmental" in brackets. Alternatively, "environmental sequences" could also be replaced by "reads".

The main idea behind this proposed term change is that the sop field should be able contain sops from any kind of bioinformatic processing of any kind of sequence.