cazzlewazzle89 / GROND

A quality-checked and publicly available database of full-length 16S-ITS-23S rRNA operon sequences
1 stars 0 forks source link

about GROND and files in zenodo #1

Open Yesica04 opened 2 weeks ago

Yesica04 commented 2 weeks ago

Hello. I'm checking the files for download and I see some file with name "refseq207nr_classifierpairA.qza", are this files for use with qiime2? Why are there classifierpairB and C?

What is the correct file of database? this [refseq207full] or this [refseq207nr]? what is the different? Thanks.

cazzlewazzle89 commented 1 week ago

Hi @Yesica04

Yes these are pre-trained naive-bayes taxonomic classifiers that can be imported directly into qiime2.

The pair[ABC] models are trained on the regions extracted from the full rrn sequences using the primer pairs evaluated in the GROND manuscript. The qiime2 tutorial gives more information, but basically you should train a classifier model using sequences extracted using the same primer sequences that are used to generate the amplicons in your experiment.

Both the full and nr files are correct, but I would recommend using the nr version (with the taxRep taxonomy labels) for the sake of speed. It is constructed from the full dataset but dereplicated to remove highly similar sequences. For more information see the Database construction and Taxonomy sections in the manuscript.

If you want a more detailed comparison of taxonomic profiling methods for 16S-ITS-23S data then check out our preprint.

All the best, Calum