benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Compatible database for soil nematodes #1218

Closed mlvermeire closed 3 years ago

mlvermeire commented 3 years ago

Dear benjjneb, Thank you very much for this great pipeline! We are currently developing the metabarcoding approach to characterize nematode communities in our soil samples. We would like to construct a database for soil nematodes to use in the pipeline. I would like to ask you some questions concerning the steps to create a database in the appropriate format. We have a list of sequences, and the corresponding taxonomy. How do we need to format this data to be used in the idtaxa or assignTaxonomy function? Thank you very much in advance for your time and consideration. I wish you a very good day, Kind regards,

Marie-Liesse

benjjneb commented 3 years ago

Hi Marie-Liesse, I suspect the relevant information is in the description of the assignTaxonomy training fasta format here: https://benjjneb.github.io/dada2/training.html#formatting-custom-databases

For IdTaxa, I have not myself tried to create a custome reference for that function. IdTaxa is defined in the DECIPHER R package, which has documentation here: https://bioconductor.org/packages/release/bioc/html/DECIPHER.html

See the "Classify sequences" document linked there, specifically Section 3 on training the classifier. @digitalwright Is there any additional online documentation that might help here?

digitalwright commented 3 years ago

@benjjneb, The Classifying Sequences vignette is the correct place to start. @mlvermeire, Please send me an email if you need more help. Erik

mlvermeire commented 3 years ago

Dear @benjjneb and @digitalwright , thank you very much for your answers! I'll have a look at the documentation you sent and will keep you posted. I wish you a very good week, kind regards