CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
50 stars 17 forks source link

Create message/output when downstream and QC analysis is not complete or is interrupted due to taxa not assigned (taxa is not bacterial) #123

Open MOREYCK opened 10 months ago

MOREYCK commented 10 months ago

Describe the current status: For v2.0.2, when given a negative control samples (C. Auris). PHX currently fails at the calculate assembly ratio step.

image

The kraken database doesn’t have C. Auris in it so in the kraken file it says 98% of reads are unclassified. For FastANI, PHX only has the bacteria genomes from Refseq so this sample does not have a taxa assigned.

image

Describe the solution you'd like When pipeline can't identify a taxa report out a "no taxa found" in GRiPHin_Summary.xlsx file and FAIL sample.

jvhagey commented 9 months ago

Hi @MOREYCK,

I have built into the v2.1.0-dev version handling for neg control. How PHX works is that it tries to first assign the taxa with fastANI based on the top 20 hits from the mash sketch that is built from all the bacterial genomes in ref seq. I tried a few yeast samples and they had either 0 MASH hits or a very bad hit (<80% ANI match with VERY low coverage). When PHX can't get a good match with FastANI it will fall back and report what taxa kraken2 assigned with weighted scaffolds (in the case of the yeast the match in kraken2 was human). In either case of 0 mash hits or a hit that is <80% ANI these "errors" will both show up in the "WARNINGS" column of the Griphin summary.

We are also working on a new entry point in PHX that will have a more limited database, this will make picking a neg control easier. This is the version we plan to do our validation with, but I don't have a timeline for that yet. Most likely in early 2024.

So your options right now are:

  1. Wait and validate the new entry point in PHX with a non-HAI bacterial isolate.
  2. Use the neg control you have already picked with v2.1.0 that will come out soon. For this you will need to write into your validation plan to consider it a neg result when you get the warnings "No MASH hit found" or "No hits with >=80% ANI." For either of these cases the "Taxa_Source" column in the Griphin summary will state "kraken2_wtasmbld" rather than "ANI_REFSEQ". In other words, you would only accept when taxa was IDed by FastANI.

Let me know your thoughts about this.