DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
246 stars 73 forks source link

Reducing false positive detection #89

Open fconstancias opened 6 years ago

fconstancias commented 6 years ago

Hi all,

Thanks a lot for your efforts developing this tool. I am using centrifuge for taxonomic profiling of metagenomes from various ecosystems and I am currently building databases for bacteria, viruses, fungal and archea using your centrifuge-download tool.

In order to reduce detection of potential false positive taxa could you help me to adopt a rationale approach to filter centrifuge classification output using the parameters such as hitLength, score.

Is there any way to get a breadth of coverage for each taxa. This might be also a good way to get rid of potential false positives.

If you have any guidance, suggestion.

Thanks a lot.

Flo

hailiangmei commented 6 years ago

Hi Flo,

I don't think there is an easy way to reduce FP here besides looking at the whole genome coverage again by doing some alignment based on centrifuge reported hits.

But perhaps https://github.com/fbreitwieser/krakenhll might be worthy to check.

Cheers, Leon

fconstancias commented 6 years ago

Hi Leon,

Thanks for your help. I have already started to explore krakenhll. I have just seen this tool https://github.com/seqan/slimm which uses genomes coverages informations to filter out the noise. Makes a lot of sense to me.

Cheers,

Flo

khyox commented 6 years ago

Another alternative for filtering (and visualizing) Centrifuge results by score, length, log length or score/length is Recentrifuge. You don't need negative control samples to use it in this way, but if you have one of more of them, Recentrifuge can go further.