Open LilyAnderssonLee opened 1 year ago
Conclusion:
1: Kraken2 and Centrifuge assigned the same 9 reads to HHV7 and one read of HHV4.
2: Centrifuge assigned 98 reads to the Viruses category, while Kraken2 assigned 33 reads to the same category.
3: Centrifuge predicted more Virus species (56) compared to Kraken2 (24). However, Centrifuge showed significantly higher false positives, as indicated by blast.
4: The false positives of Centrifuge can be reduced by considering hitLength
and numMatches
in the classification. For instance, Alcelaphine herpesvirus 1 was falsely reported by Centrifuge, but the hitLength was only 23 bp, which is far from the reliable standard. I would suggest setting the hitLength between 50 bp to 100 bp to reduce false positives. As for numMatches, we could skip it for now since I am not sure about species sharing the same genomic regions.
5: Centrifuge assigned 9 reads to Human endogenous retrovirus K113, which were not reported by Kraken2. These 9 reads were confirmed by BLAST.
During ENNGS workshop, it was mentioned that one lab is using 5000 as quality filter.
Good to know this. I think we need to test all parts in taxprofiler we are using by simulated data. I am thinking if we need to add this to the taxprofiler validation report.
Since we have not tested this metric, I would suggest to wait to the next version of validation.
Yes, it makes sense. We need to do this validation based on simulated data to improve the performance of the taxprofiler
and to meet the requirements of IVDR, perhaps sometime in late autumn or winter, after the major release of taxprofiler
for long reads.
Centrifuge uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index.
Run Centrifuge and Kraken2 for the samples within the clinical case #374764. In the DNA sample, we have confirmed that 9 reads were assigned to HHV7, and these reads were identified as true positives
TO ANSWER: 1: Can we also detect HHV7 using Centrifuge, and how many reads were assigned to it?
2: Which classifier identifies more organisms?
3: Which classifier has more false positives in the assigned reads when validated through blasting?"