CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
56 stars 19 forks source link

add new phoenix qc entrypoint #83

Closed rpetit3 closed 1 year ago

rpetit3 commented 1 year ago

This PR adds a new entry point for PHoeNIx, called PHOENIX_QC. What this does is allow users to run their samples through other pipelines, then determining if their samples pass PHoeNIx QC metrics.

PHOENIX_QC requires a different type of CSV file with the following columns:

    sample             A sample name for the input
    fastq_1            R1 of reads run through Fastp
    fastq_2            R2 of reads run through Fastp
    fastp_pass_json    JSON output from initial Fastp run
    fastp_failed_json  JSON output from rerun of Fastp on failed reads
    spades             Assembly created by SPAdes
    mlst               TSV output from mlst tool
    quast              TSV report generated from quast
    amrfinderplus      TSV report generated from amrfinderplus

The remained steps in PHOENIX_QC match the rest of the PHOENIX pipeline. I thought it best to let analyses (e.g. kraken, gamma) with PHoeNIx-specific databases still be handled by PHoeNIx.

Right now I've pointed this at the v1.0.1 branch.

Happy to take any suggestions and feedback.