Open shaylashahar opened 2 years ago
are those tax IDs also contained in the nodes.dmp/names.dmp files?
Are you sure, there wasn't an issue with mkbwt/mkfmi steps? It's difficult to see for me what went wrong from you description..
Yes, Tax IDs are contained in nodes.dmp.
for the next step, I ran: kaiju-mkbwt -n 25 -a ACDEFGHIKLMNPQRSTVWY -o proteins proteins.faa and then ran: kaiju-mkfmi proteins
Looks alright. Not sure how I can help you without your faa file.
I want to create a custom database using GTDB (genome taxonomy databse) where the protein sequences are identified by their accession number. I wrote a python script to find the Tax ID of each accession number, and rewrite the file to fit kaiju's requirement. I successfully made the file to have ">Tax ID \n [protein sequence]" and then followed kaiju's directions to mkbwt and mkfmi.
I previously ran my metagenomes on nr_euk and ran kaiju2table with no issues. When I ran the same metagenomes with my GTDB, everything came out with zeros. I looked back at my .fmi and compared it to the .fmi from nr_euk and they look very different. My .fmi looks like protein sequences one after the next. I'm not sure what went wrong, but I'm pretty sure it happened when creating the .fmi file. Any ideas?