GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
39 stars 21 forks source link

Update term definition: `assembly_software`, or new term `assembly_method` #596

Open jfy133 opened 1 year ago

jfy133 commented 1 year ago

Context:

In ancient DNA studies, a popular field is to recover ancient pathogen genomes (e.g. Yersinia pestis), i.e. the genomes of dead organisms where much of the cellular biomass has disintegrated. However the degraded DNA of the microbial cells can still be preserved (e.g. bound to skeletal mineral) and (majority) genome reconstruction is possible .

There is a debate in MInAS where such a sample/sequence fit into the MIxS schema. My initial reaction was that it would go into MIGSBacteria (as it's a single genome, not a whole metagenome).

Problem

As ancient DNA is very short, de novo methods do not work, and thus many researchers currently opt for short-read reference based mapping of the short reads to recover a genome-length conensus sequence.

aDNA uses reference and consensus based genome reconstruction, whereas MIGSBacteria has fields for assembly_software, but it is unclear what the definition is of assembly: is that purely de novo, or are reference based mappings allowed?

Possible solutions

  1. Update the assembly_software term definition to specify this can be any form of assembly
  2. Add an additional (presumably) enumerated term called assembly_method (suggested by @only1chunts ) that additionally specifies whether the assembly was denovo or reference-based
genomewalker commented 1 year ago

Maybe useful for the discussion, NCBI has the following package. For example, SAMN08018243 has the following SRX3572926