Open MOREYCK opened 1 year ago
Hi @MOREYCK,
I have built into the v2.1.0-dev version handling for neg control. How PHX works is that it tries to first assign the taxa with fastANI based on the top 20 hits from the mash sketch that is built from all the bacterial genomes in ref seq. I tried a few yeast samples and they had either 0 MASH hits or a very bad hit (<80% ANI match with VERY low coverage). When PHX can't get a good match with FastANI it will fall back and report what taxa kraken2 assigned with weighted scaffolds (in the case of the yeast the match in kraken2 was human). In either case of 0 mash hits or a hit that is <80% ANI these "errors" will both show up in the "WARNINGS" column of the Griphin summary.
We are also working on a new entry point in PHX that will have a more limited database, this will make picking a neg control easier. This is the version we plan to do our validation with, but I don't have a timeline for that yet. Most likely in early 2024.
So your options right now are:
Let me know your thoughts about this.
Describe the current status: For v2.0.2, when given a negative control samples (C. Auris). PHX currently fails at the calculate assembly ratio step.
The kraken database doesn’t have C. Auris in it so in the kraken file it says 98% of reads are unclassified. For FastANI, PHX only has the bacteria genomes from Refseq so this sample does not have a taxa assigned.
Describe the solution you'd like When pipeline can't identify a taxa report out a "no taxa found" in GRiPHin_Summary.xlsx file and FAIL sample.