gbif / doc-publishing-dna-derived-data

This guide shows how to publish DNA-derived spatiotemporal biodiversity data and make it discoverable through national and global biodiversity data discovery platforms. Based on experiences from Australia, Norway, Sweden, UNITE, and GBIF.
https://doi.org/10.35035/doc-vf1a-nr22
Other
2 stars 7 forks source link

Add field for denoising method? #147

Open andand opened 2 years ago

andand commented 2 years ago

Sorry for bringing this up very late, and we may have discussed this during the writing, but now that I'm in the process of submitting metabarcoding data, it becomes obvious that a field specifying sequence denoising method is missing. Not only in the guide but also in the DNA derived data extension. Since denoising/generating ASVs is widely used in metabarcoding, and is also well described in the guide, it is a bit weird that the method used for this step lacks a field in the guide and extension. Other steps of the bioinformatics processing of metabarcoding data are included in the extension, such as taxonomic annotation (tax_class - at least if one doesn't read the description in too much detail: "used to classify new genomes"...) and chimera removal (chimera_check), but not the very central step of denosing. At the same time, a multitude of fields for different steps of analysing e.g. metagenome-assembled genomes are included in the extension. This is good, but makes the lack of a description of denoising method for metabarcoding data more obvious and problematic. Maybe we decided that the "otu_seq_comp_appr" should cover also this, but I think it would be much better to have a specific term for denoising since it is not the same thing.

I discussed this with @pragermh and @erikrikarddaniel and we suggest to add a new term "denoising_appr" with the description "Tool and settings used to denoise amplicon sequences / generating ASVs" or something along that line.

CecSve commented 2 years ago

I also struggled with reporting on the ASV processing and think it can lead to unstandardised reporting of the bioinformatics used, e.g. so far I have used both occurrence core and the DNA-derived extension to register my methods ('identification'-fields and 'chimera_check') (I have not published the dataset yet).

I support the idea of a new term 'denoising_appr' and perhaps the description also could include scripts/protocol links, e.g.: "Tool and settings, and/or published scripts or protocols, used to denoise amplicon sequences / generating ASVs"