gbif / doc-publishing-dna-derived-data

This guide shows how to publish DNA-derived spatiotemporal biodiversity data and make it discoverable through national and global biodiversity data discovery platforms. Based on experiences from Australia, Norway, Sweden, UNITE, and GBIF.
https://doi.org/10.35035/doc-vf1a-nr22
Other
2 stars 7 forks source link

sampleSizeValue description is vague #173

Closed sformel-usgs closed 1 year ago

sformel-usgs commented 1 year ago

In section 2.2.1, sampleSizeValue is defined as, "Total number of reads in the sample. This is important since it allows calculating the relative abundance of the sequence variant within the sample."

This is vague in that it doesn't define how much filtering is appropriate before defining total reads. The last sentence in section 1.3.1 is the only other place total reads are mentioned, I think, and has the same ambiguity. My intuition is that you would get a variety of answers if you asked people to define this point in Figure 4. If there is a point of consensus that was reached while developing this document, Figure 4 would be a good place to annotate it.

tobiasgf commented 1 year ago

The description of sampleSizeValue in Table2 has been changed... from "Total number of reads in the sample. This is important since it allows calculating the relative abundance of the sequence variant within the sample." to: "Total number of reads in the sample. This is important since it allows calculating the relative abundance of the sequence variant within the sample. This number should preferably be calculated after universal processing (quality control, ASV denoising, chimera removal, etc.), but before manual/selective removal of e.g. non-target OTUs/ASVs from the dataset."