gbif / doc-publishing-dna-derived-data

This guide shows how to publish DNA-derived spatiotemporal biodiversity data and make it discoverable through national and global biodiversity data discovery platforms. Based on experiences from Australia, Norway, Sweden, UNITE, and GBIF.
https://doi.org/10.35035/doc-vf1a-nr22
Other
2 stars 7 forks source link

Lack of a critique of sequence databases #37

Closed qgroom closed 3 years ago

qgroom commented 4 years ago

There is no critique of sequence databases and the associate data they hold (ENA, Genbank). Their poor data structure, makes linking data so much harder and they are largely useless for tracking provenance. I was expecting at least some comment on this.

andersfi commented 3 years ago

This issue points either in direction of a lack of motiviation for sequence data into GBIF (i.e. the ENA/Genbank is not enough), or there is some general warning of ENA/Genbank structure?

@NewBeHenrik - writes a sentence on this to put into the text and e-mail to @thomasstjerne for inclution into the text and closing of issue..

puh32 commented 3 years ago

Under "Taxonomy of sequences", under sentence "The accuracy and precision of such sequence annotation will depend on the availability of reliable reference databases and libraries across all branches of the tree of life, which in turn will require joint efforts from taxonomists and molecular ecologists.", I propose that we add the following sentence:

"Public sequence databases should always be used knowingly of the fact that they suffer from various shortcomings related to, e.g., taxonomic reliability and lack of standardized metadata vocabularies (Hofstetter et al. 2019; Durkin et al. 2020)."

References:

https://mycokeys.pensoft.net/article/56691/ https://link.springer.com/article/10.1007/s13225-019-00428-3