genomic-medicine-sweden / taxprofiler

Taxonomic profiling of shotgun metagenomic data
https://nf-co.re/taxprofiler
MIT License
0 stars 1 forks source link

Confidence score in Kraken2 #31

Open LilyAnderssonLee opened 1 year ago

LilyAnderssonLee commented 1 year ago

Confidence score: the proportion of k-mers mapping to a taxon, the default value is 0. A taxonomic classification is only taken if it is above this pre-defined threshold. Therefore confidence threshold had a large impact on the number of reads classified as well as the number of taxa found to be present within a sample.

There is a paper discussed about how the confidence score affect the classification. The main suggestions are that the right threshold depends on factors like sample origin (human or environmental), diversity (low or high), and the goal (minimizing false positives or false negatives).

TO DO Run pipeline based on: 12 validation samples and confidence score was: 0, 0.1, 0.3, 0.5, 0.7, 0.9

LilyAnderssonLee commented 1 year ago

Conclusion: As mentioned in the above mentioned paper, an elevated confidence threshold results in a loss of classified reads, while a reduced confidence threshold leads to an increase in false positives.

In the context of clinical cases, we aim for a greater number of assigned reads to detect any potential organisms. Therefore, we currently intend to continue using the default value of 0.