fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
217 stars 43 forks source link

Kraken, KrakenUniq or centrifuge? #33

Open juulluu21 opened 5 years ago

juulluu21 commented 5 years ago

It’s sort of difficult to keep pace with all your different releases😀. So, which one should I use? Kraken, KrakenUniq or centrifuge??? I just have started centrifuge; should I move to KrakenUniq?

Thanks.

fbreitwieser commented 5 years ago

Hi @juulluu21 , KrakenUniq of course ;). More seriously, I can give you the following info regarding the difference of these three programs:

If you can, you should try both KrakenUniq and Centrifuge (as well as maybe some other great classifiers out there) and see how well they work on your data and use-case!

sschmeier commented 5 years ago

Where does Kraken2 fit in here? Also, can/should Bracken be used with Krakenuniq? Cheers

VadimDu commented 5 years ago

Hi Sabstian,

This is a good question! I also would like to know why to use KrakenUniq over Kraken2? Regarding Bracken, I asked the author of Bracken same question yesterday, her answer was: "The KrakenUniq information are not considered currently in Bracken's algorithm and it would require a bit of a change in the algorithm and quite a bit of additional testing to incorporate it"

I have never tried to used Kraken1/2, I have started directly from KrakenUniq, and then I discovered the issue with accurate abundance estimation of the taxa present. So the great thing about KrakenUniq is the strain/plasmid resolution, but as at the moment you won't get abundance estimation as you would for MetaPhlan2 for example (species table with relative abundances). But you can manually examine individual species/strains of interest from the report, or use their tax_ID to extract their matched reads for downstream analysis (with krakenuniq-extract-reads).

Regards, Vadim

fbreitwieser commented 5 years ago

Hi @VadimDu @sschmeier , Kraken2 is a new development that enables using smaller databases than Kraken 1 or KrakenUniq. However, Kraken2 does not collect information on the unique k-mer coverage that KrakenUniq calculates. You will not get better classifications with Kraken2, the improvements are just on the database size and speed. In that way Kraken2 competes with Centrifuge, and you should try both to see which one works better on your data if a small database size is important.

KrakenUniq, on the other hand, is the only metagenomics classifier (to my best knowledge), that gives you k-mer coverage information. For example, you may see that Mycobacterium tuberculosis (Mtb) has 10k reads with either Centrifuge or Kraken2. But are those reads all mapping to the same position within one Mtb strain, thus making it unreliable? Neither Centrifuge nor Kraken or Kraken2 give you the answer; you'd have to extract all the matching reads and re-align them against a selected genome and run e.g. bamcov :) on it to see the genome coverage. KrakenUniq's unique k-mer count, on the other hands, reports the number of Mtb k-mers that were hit by any of the reads, and with that it can be easier to decide whether a classification is good or not. For standard usage I recommend filtering out all identifications that fall below a certain unique k-mer count threshold, such as 1k unique k-mers.

fbreitwieser commented 5 years ago

@VadimDu , you can use KrakenUniq with Bracken. My suggestion is to filter on the unique k-mer count column first (awk '$4 >= 1000 KU_REPORT'). That way, only good identifications are given to Bracken.

VadimDu commented 5 years ago

Hi @fbreitwieser,

Thank you very much for your answer! I was actually planning to ask you what is a good threshold to filter based on unique k-mer counts... :-)

I have tried to run Bracken on KrakenUniq reports and got an incompatibility error, that it could not locate the install dir of kraken1 or krakenw, even when I have changed in the script to look for krakenUniq. I have asked the author of Bracken, she said even though the read counts output of KrakeUniq is similar to Kraken-1, more complicated is "KrakenUniq information are not considered currently in Bracken's algorithm and it would require a bit of a change in the algorithm and quite a bit of additional testing to incoorporate it"

Have you tried to run Bracken with KrakenUniq reports that include the extended taxID of NCBI? (genomes assemblies and plasmids).

What I did at the moment for downstream analysis, is filter based on uniq k-mer counts, and then to extract only "species" level taxonomy read counts. Finally I can filter very low abundant species and proceed with various normalization methods / transform to relative abundance based on total reads in a sample. Does it makes sense to you?

Thank you very much Vadim

MjelleLab commented 3 years ago

@fbreitwieser

Hi, You mention that 1000kmers could be a good filtering threshold, but in the paper you mention the ratio between the reads and the kmer to be important. Do you thing a read/kmer ratio is even better for tresholding? For instance 50x or 100x more kmers than reads for a specific OTU? Have you tested the specificity and sensitivity of such thresholding?

Best,