Closed Piplopp closed 2 years ago
Have you had any luck with this? I have a similar problem, however mine does report errors of missing taxonomy ID, even though they exist in the acc2id.map, nodes.dmp and names.dmp files
Same problem here, with a custom built database. SeqID gets assigned, but without taxID, even though its present in the .map and .dmp files. Haven't found a solution or explanation yet.
I noticed that if you set the -k
option high enough the taxids appears even for the SeqIDs that were previously problematic (in my case I tried with -k 1000
but I did not manage to understand what's happening so far
When inspecting my .map file, I noticed a number of formatting issues within it, such as line breaks and merges within a taxID/seqID (A quick way to check for me was too see if any lines started with a number). I was able to resolve my issue by creating a .map file manually. Hope it helps anyone out there!
Did some steps again and everything seems the same, but the problem has been resolved.
Either one of them, or a combination caused the problem to be resolved.
Did the exact same thing and the problem has been resolved. What happened is still a mystery
Hello !
I'm trying to index the SILVA database for centrifuge. I built the acc2taxid map, the nodes.dmp and names.dmp files just fine but I noticed some sequences were assigned a taxid of 0 when trying to classify. I had the same behavior when using those files but produced by Kraken2.
As you can see, for the
readID: 6d3e5d85b2aa35d58347bb4b9b203e43
a lot of the matches have a seqID assigned but the taxID is 0. I double checked both the centrifuge-build command (no missing taxonomy id in the output) and the various files and all seems to be fine.The query
readID: cb6ee53401962a26788af21de2a16f67
at the end is behaving as expected, the seqID does have it's expected taxid.For instance for the seqID
U92195.1.1541
andMF457876.1.1456
acc2taxid:
nodes.dmp
names.dmp
And from centrifuge-inspect:
I also tried to replace the dots in the sequence ids by '_' just in case but the result is the same. I would expect a taxid of 0 if the sequence was not found in the taxonomy or if the seqID was actually a LCA like the third assignment to 'genus', but in those case, my query was assigned to a specific seqID and thus I would expect to find the related taxID.
Maybe there's something I don't fully understand, but in any case if you have any idea of what can be happening or why :)
Thanks a lot !