Open DntBScrdDv opened 3 months ago
Kraken 2 / Bracken 16s RNA indexes are available for Greenegenes, RDP, Silva.
https://benlangmead.github.io/aws-indexes/k2
Does this help?
Hi @ChillarAnand ,
Many thanks for your reply. Unfortunately, no - this doesn't help. The Kraken2 RDP and Silva databases are limited to genus level, while GreenGenes has not been updated since - I think - 2016.
Hi @DntBScrdDv , I have built an unofficial strain level version of the RDP database for my research: https://www.bioinformatics.uni-muenster.de/tools/metag/download Maybe you can get it to work with Kraken2, but I guess it will require quite some tinkering. If you use the database in your research, I would appreciate, if you cite my preprint which is linked on the download website. Have a nice day, Felix
Many thanks for this @Username-felix-is-not-available ,
I'm sorry, but could you explain a little what all the different files are? Is the .fa the sequences? What's the giant .suf file?
Thanks!
You are very welcome, @DntBScrdDv . For your purposes, you can ignore all files except the "RDP16s28s.fa" (sequences) and the "tax.RDP16s28s.txt" (taxonomy) files. The other files either provide metadata or are specific to the LAST alignment program which I used for my project.
I hope my message will not send you down the rabbit hole, because Kraken2 uses a vastly different approach to taxonomy files than I did. In my files, you can use the sequence ID in the FASTA file to find the matching taxonomic string in the taxonomy file. The string contains the full lineage. Kraken2 uses an approach based on taxonomy IDs and splits the lineage in single taxa (see names.dmp and nodes.dmp files in Kraken2 database). For the special databases, it is best to assume that they are not identical to the NCBI taxonomy IDs (i.e. they are artificial). I think translating my files to Kraken2 format could be very difficult. It may be easier to use the logic in my script and add it to Kraken2's build_rdp_taxonomy.pl. The logic is described here (Supplementary Methods 4.2) in more general terms. Nevertheless, I don't know what downstream effects this would have. My automated approach to fix the taxonomy is also not fool proof and I am not a taxonomist by training. So there will be some room for improvement. If you come up with a better approach, please let me know.
Best, Felix
Hi all,
I'm in need of a species-level 16S database. I had relied on the rdp database on other analysis platforms (e.g. FROGS) but the pre-built kraken2 rdp database only goes to genus-level.
I built a database from the RefSeq database but it is missing many key taxa (e.g. candidatus Omnitrophus).
Does anyone know of a species-level 16S database for kraken2 that is broader than RefSeg? e.g. a species-level Silva or RDP?
Many thanks