Closed Sebastian-Mynott closed 5 years ago
Glad it's helpful for you!
Have you seen the formatting custom databases entry on our web site?
Does that answer your question? Or is it something not covered there?
Yes, I've seen that entry. On that same page you have links to collections for SILVA and others. I'm looking for instructions on formatting data from other sources, such as NCBI and BOLD, for example.
Basically whatever the source, it needs to be distilled into a fasta file with the expected format
>Level1;Level2;Level3;Level4;Level5;Level6; ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC >Level1;Level2;Level3;Level4;Level5; CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC
As for how to do that exactly, it will involve some parsing and reformatting that will depend on what format the database you are trying to use is in, so hard to give a simple answer there. That part can be done with R or non-R tools, such as shell scripts or Python.
Thank you very much for dada2! It is my new favourite method for working with MiSeq reads!
I do environmental metabarcoding so I need to create custom databases for community analysis. Would you have any tips/instructions/tutorial on how to download and format custom databases within R?
Many thanks!