Open HitMonk opened 4 years ago
The only thing that I would be concerned about is the taxonomy. If the GTDB taxonomy does include the viral refseq taxonomy IDs, then it should work just fine.
Bracken also would not be affected by including both of these databases prior to building.
Im sorry if this is a stupid question but how can i check if GTDB taxonomy has the viral refseq taxonomy IDS? Also, is it possible for me to just merge the viral taxonomy ids with the GTDB taxonomy? I had to merge archaeal and bacterial taxonomy ids as GTDB provides them separately.
You can check if the taxonomy IDs for one of the viral sequences is in the GTDB taxonomy files.
If they're not, you will have to extract all of the taxonomy IDs belonging to viral sequences (and their parent taxids) from the Refseq taxonomy and merge that with the GTDB taxonomy (I believe just making sure that "Viruses" is connected to root would be enough)
that makes sense... Ill try this and report back in a week or so. Thank you so much for your help!
Hi!! We're your u able to achieve this??
@Rohit-Satyam Yep! I used FlextaxD to merge the databases. https://github.com/FOI-Bioinformatics/flextaxd
Hello everyone, I was trying to build Krake2 databases with GTDB. However, since GTDB consists of only bacterial and archaeal sequences it would be ideal to build it along with the viral Refseq. Im not sure if this is even possible or compatible with bracken downstream. Please let me know if you have any suggestions.