gbif / doc-publishing-dna-derived-data

This guide shows how to publish DNA-derived spatiotemporal biodiversity data and make it discoverable through national and global biodiversity data discovery platforms. Based on experiences from Australia, Norway, Sweden, UNITE, and GBIF.
https://doi.org/10.35035/doc-vf1a-nr22
Other
2 stars 7 forks source link

Discussion on metagenomics? #82

Closed SSuominen1 closed 3 years ago

SSuominen1 commented 4 years ago

This is focused on metabarcoding studies, but metagenomics is also briefly mentioned. I guess adding information from metagenomics datasets is another challenge, maybe there could be some mention of plans for this?

dschigel commented 3 years ago

Good point. This needs at least i) checking if both m-words are in the glossary @pragermh and ii) adding a few lines to introduction, would you suggest a couple of lines @abissett? Unless we firmly believe that species detection by full genomes is out of scope.

abissett commented 3 years ago

Andrew to write a short section on metagenomics, describing this and stating that this document is based on current practice (barcoding), but that the same general principles around publishing a sequence apply

abissett commented 3 years ago

Needs to be read Daniel and Frode to ensure I've covered all we discussed.................

Insert the below as 1.3.3

Change the subsequent numbering to allow the insertion

1.3.3. Metagenomic: sequence-derived data Sequence derived diversity data may also be generated using amplification free metagenomic methods whereby all DNA in a sample is targeted for sequencing (add this cite here: Tyson, Gene W. and Hugenholtz, Philip (2005). Environmental shotgun sequencing. Encyclopedia of genetics, genomics, proteomics, and bioinformatics. Edited by Lynn B. Jorde. West Sussex, U.K.: John Wiley & Sons.1386-1391.https://doi.org/10.1002/047001153X.g205313) , rather than specific amplicons or barcodes, as described above. Sequence derived diversity data obtained from metagenomic sequencing can be in the form of sequence matches to annotated gene databases (as above) or as (near) complete metagenome assembled genomes (MAGS). While metabarcoding methods still dominate in terms of sequence derived diversity information, metagenomic data is becoming more important, as evidenced by the growing number of MAGS and their utility in informing phylogeny and taxonomy (Add this cite https://www.nature.com/articles/s41587-020-0501-8). While we recognise , discussion of the rapidly evolving methods associated with metagenome analysis are beyond the scope of this document. This document uses metabarcoding as the model for discussion around concepts and methods for publishing sequence derived diversity data, and while the bioinformatic pathways will differ for metagenomic data, the end result (a sequence, often in the form of a contig/assembly) is congruent with the concepts suggested for metabarcoding data (i.e., sample specific, sample collection, data generation and processing workflow metadata should be captured).

erikrikarddaniel commented 3 years ago

Fine, with a two minor comments.

  1. I'd prefer "MAGs", i.e. plural s not capitalised, though I don't know how the rest of the document looks.
  2. Replace the comma after "While we recognise" with a "that"
dschigel commented 3 years ago

@FFossoy please take a look. Hope after Daniel's and maybe your edits this can be tagged do&close, processed and closed.

FFossoy commented 3 years ago

All fine with me. @abissett can tag this as do&close.

thomasstjerne commented 3 years ago

Great - I will "do and close" then