Closed mmp3 closed 5 years ago
resolved.
This behavior is caused by using the refseq database in p+h+v
.
The reason is that in the refseq database, many genomes are assigned the same taxonomy id, as observed in centrifuge-inspect --conversion-table
.
I am trying to prevent least common ancestor classification altogether and instead have centrifuge report relative abundance only for strains/genomes ("leaf"). I understand that in many cases this will be very, very slow. But in certain scenarios, this is desired.
I tried option
-a
to report all hits, but thereport
file still reports abundance at the species-level for some taxa. I tried option-k 100000
but thereport
file still has abundance at the species-level for some taxa.For example, I always get an abundance for species Escherichia coli (taxa id = 562), but then also get relative abundances for some E. coli strains (
leaf
). But I don't want any read classifications to be promoted to species level, I want onlyleaf
level labels for read, even if that means that a read gets hundreds of leaf labels.How do I force centrifuge to record all possible mapping positions for each read so that the relative abundances in
report
are only forleaf
, notspecies
or anything higher?