Closed kweitemier closed 5 years ago
In your case of 111304, the hits are the full reads, in other words, there is no hit from Centrifuge onto other taxonomy ids. So this case is slightly different from the "A,B" example, where Centrifuge finds two different hits. In Centrifuge, the searching hits stage does not exclude any genome even with --exclude-taxids options. It then filters the hits that matched in the node or children of the tax ids from --exclude-taxids. Does this help?
Thanks for the quick response! It looks as though there are full read hits to taxa 7904, 111304, and 1421439 that all receive the same score. If I exclude one of these taxa, shouldn't all the hits to the others still be reported?
For example, when I exclude 1421439 I should still see seven hits to 111304 and two hits to 7904. Similarly, when I exclude 111304 I should still see the hits to 1421439 and 7904, correct?
(Note that these three taxa are at the same level in the taxonomy tree, the parent node for each of them is taxon 7901.)
You are right. What you found is indeed a bug in Centrifuge! I've uploaded the new file in github which should fix this issue. Thanks for letting us know about this.
Hello,
I've been working with Centrifuge using a custom database and have been using
--exclude-taxids
to disregard several taxa in the database. My understanding was that--exclude-taxids
would essentially behave as though the listed taxa were not in the database. That is, if a read hits taxon A with a high score and hits taxon B with a slightly lower score, and k=1, then taxon A would be reported. But, if taxon A was excluded then the only hit would be to taxon B and it would be reported.I have a read that I'm testing that, when k is large, produces the following classification output.
But, when I try the same command excluding taxon 1421439 I get the following:
And when I try excluding only taxon 111304:
Is this demonstrating expected behavior?
This is with 1.0.4 beta, 64-bit Compiler: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) Options: -O3 -m64 -msse2 -funroll-loops -g3 -std=c++11 -DPOPCNT_CAPABILITY Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
I'd also like to ask if a description of the fields in the metrics file is available somewhere.
Thanks for any insights you can provide.