add new phoenix qc entrypoint

rpetit3 commented 1 year ago

Duplicate of https://github.com/CDCgov/phoenix/pull/83

This PR adds a new entry point for PHoeNIx, called PHOENIX_QC. What this does is allow users to run their samples through other pipelines, then determining if their samples pass PHoeNIx QC metrics.

PHOENIX_QC requires a different type of CSV file with the following columns:

    sample             A sample name for the input
    fastq_1            R1 of reads run through Fastp
    fastq_2            R2 of reads run through Fastp
    fastp_pass_json    JSON output from initial Fastp run
    fastp_failed_json  JSON output from rerun of Fastp on failed reads
    spades             Assembly created by SPAdes
    mlst               TSV output from mlst tool
    quast              TSV report generated from quast
    amrfinderplus      TSV report generated from amrfinderplus

The remained steps in PHOENIX_QC match the rest of the PHOENIX pipeline. I thought it best to let analyses (e.g. kraken, gamma) with PHoeNIx-specific databases still be handled by PHoeNIx.

Right now I've pointed this at the v1.2.0-dev branch.

Happy to take any suggestions and feedback.

jvhagey commented 1 year ago

@rpetit3, In my mind the output of this would just tell you if your samples passed QC. So you would get a file back that said PASS/FAIL and the reason why for each sample. Why does it then run further analysis? Our team is developing an entry (-entry scaffolds) that would take in assemblies and check some QC and then run everything post SPAdes. It seems like there might be some over lap with what you had in mind?

rpetit3 commented 1 year ago

There are some steps in PHOENIX that require custom databases (e.g. kraken and gamma). I thought it best to have PHOENIX handle that for the user, instead of alternate pipelines integrating it into their own.

CDCgov / phoenix

add new phoenix qc entrypoint #93