StaPH-B / docker-builds

:package: :whale: Dockerfiles and documentation on tools for public health bioinformatics
GNU General Public License v3.0
187 stars 119 forks source link

adds stxtyper 1.0.24 #1045

Closed kapsakcj closed 1 month ago

kapsakcj commented 1 month ago

Will fill this out later and mark ready for review after I've finalized things

This PR adds stxtyper v1.0.24, the first version release of a new software called stxtyper that is used to detect and type shiga toxin genes in bacterial genome assemblies. It also attempts to detect novel shiga toxin subtypes in cases where the detected sequences diverge from the reference sequences.

These genes are usually found in E. coli (STEC), but can also be found in Shigella species as well as some other genera more rarely, like Klebsiella. It is developed by NCBI in collaboration with a number of different groups including CDC, FDA, SSI, and others. A publication to fully describe the tool and it's validation is in the works but a software release has been made so the community may test the software further and begin using the tool.

⚠️ I would caution against (clinical) reporting of results from this tool unless a validation has been performed by the user. The tool is performing well in our hands, but of course advise caution if/when reporting results.

Pull Request (PR) checklist:

kapsakcj commented 1 month ago

This is not urgent, so don't worry about reviewing soon but I wanted to let y'all know that this PR is ready for review

erinyoung commented 1 month ago

Looks like the tests worked:

#9 [test 1/2] RUN tblastn -version && stxtyper --version && stxtyper --help && cd /stxtyper && bash test_stxtyper.sh
#9 0.084 tblastn: 2.12.0+
#9 0.084  Package: blast 2.12.0, build Mar  8 2022 16:19:08
#9 0.088 1.0.24
#9 0.090 Determine stx type(s) of a genome, print .tsv-file
#9 0.090 
#9 0.090 USAGE:   stxtyper [--nucleotide NUC_FASTA] [--name NAME] [--output OUTPUT_FILE] [--blast_bin BLAST_DIR] [--amrfinder] [--print_node] [--nucleotide_output NUC_FASTA_OUT] [--debug] [--log LOG] [--quiet]
#9 0.090 HELP:    stxtyper --help or stxtyper -h
#9 0.090 VERSION: stxtyper --version or stxtyper -v
#9 0.090 
#9 0.090 NAMED PARAMETERS
#9 0.090 -n NUC_FASTA, --nucleotide NUC_FASTA
#9 0.090     Input nucleotide FASTA file (can be gzipped)
#9 0.090 --name NAME
#9 0.090     Text to be added as the first column "name" to all rows of the report, for example it can be an assembly name
#9 0.090 -o OUTPUT_FILE, --output OUTPUT_FILE
#9 0.090     Write output to OUTPUT_FILE instead of STDOUT
#9 0.090 --blast_bin BLAST_DIR
#9 0.090     Directory for BLAST. Deafult: $BLAST_BIN
#9 0.090 --amrfinder
#9 0.090     Print output in the nucleotide AMRFinderPlus format
#9 0.090 --print_node
#9 0.090     Print AMRFinderPlus hierarchy node
#9 0.090 --nucleotide_output NUC_FASTA_OUT
#9 0.090     Output nucleotide FASTA file of reported nucleotide sequences
#9 0.090 --debug
#9 0.090     Integrity checks
#9 0.090 --log LOG
#9 0.090     Error log file, appended, opened on application start
#9 0.090 -q, --quiet
#9 0.090     Suppress messages to STDERR
#9 0.090 
#9 0.090 Temporary directory used is $TMPDIR or "/tmp"
#9 0.092 Testing ./stxtyper
#9 0.092   To test stxtyper in your path run 'test_stxtyper.sh path'
#9 0.098 Running: ./stxtyper --nucleotide_output test/basic.nuc_out.got -n test/basic.fa
#9 0.098 Software directory: '/stxtyper/'
#9 0.098 Version: 1.0.24
#9 11.60 stxtyper took 11 seconds to complete
#9 11.60 ok: test/basic.fa
#9 11.60 ok: --nucleotide_output test/basic.nuc_out.got options worked
#9 11.61 Running: ./stxtyper -n test/synthetics.fa
#9 11.61 Software directory: '/stxtyper/'
#9 11.61 Version: 1.0.24
#9 23.42 stxtyper took 12 seconds to complete
#9 23.43 ok: test/synthetics.fa
#9 23.43 Running: ./stxtyper -n test/virulence_ecoli.fa
#9 23.43 Software directory: '/stxtyper/'
#9 23.43 Version: 1.0.24
#9 42.10 stxtyper took 19 seconds to complete
#9 42.11 ok: test/virulence_ecoli.fa
#9 42.11 Running: ./stxtyper -n test/cases.fa
#9 42.11 Software directory: '/stxtyper/'
#9 42.11 Version: 1.0.24
#9 53.74 stxtyper took 11 seconds to complete
#9 53.75 ok: test/cases.fa
#9 53.75 Running: ./stxtyper --amrfinder -n test/amrfinder_integration.fa
#9 53.75 Software directory: '/stxtyper/'
#9 53.75 Version: 1.0.24
#9 65.34 stxtyper took 12 seconds to complete
#9 65.34 ok: test/amrfinder_integration.fa
#9 65.35 Running: ./stxtyper --amrfinder --print_node -n test/amrfinder_integration2.fa
#9 65.35 Software directory: '/stxtyper/'
#9 65.35 Version: 1.0.24
#9 76.99 stxtyper took 11 seconds to complete
#9 76.99 ok: test/amrfinder_integration2.fa
#9 76.99 Done.
#9 76.99 
#9 76.99 
#9 76.99 ok: all 7 stxtyper tests passed
#9 DONE 77.0s

#10 [test 2/2] RUN echo "downloading test genome & running through stxtyper..." && wget -q https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/224/845/GCA_012224845.2_ASM1222484v2/GCA_012224845.2_ASM1222484v2_genomic.fna.gz && stxtyper -n GCA_012224845.2_ASM1222484v2_genomic.fna.gz | tee test-result.tsv && grep 'stx2o' test-result.tsv | grep 'COMPLETE'
#10 0.071 downloading test genome & running through stxtyper...
#10 0.221 Running: stxtyper -n GCA_012224845.2_ASM1222484v2_genomic.fna.gz
#10 0.221 Software directory: '/stxtyper/'
#10 0.221 Version: 1.0.24
#10 24.27 #target_contig    stx_type    operon  identity    target_start    target_stop target_strand   A_reference A_reference_subtype A_identity  A_coverage  B_reference B_reference_subtype B_identity  B_coverage
#10 24.27 CP113091.1    stx2o   COMPLETE    100.00  2085533 2086768 +   WAK[520](https://github.com/StaPH-B/docker-builds/actions/runs/10775770038/job/29880967537#step:8:526)85.1  stxA2o  100.00  100.00  QZL10983.1  stxB2o  100.00  100.00
#10 24.27 stxtyper took 24 seconds to complete
#10 24.27 CP113091.1    stx2o   COMPLETE    100.00  2085533 2086768 +   WAK52085.1  stxA2o  100.00  100.00  QZL10983.1  stxB2o  100.00  100.00
#10 DONE 24.3s
erinyoung commented 1 month ago

I'm going to

  1. merge this PR
  2. deploy as 'stxtyper' with tags '1.0.24' and 'latest'
erinyoung commented 1 month ago

Thank you for putting this together! You can check the status of the deploy here : https://github.com/StaPH-B/docker-builds/actions/runs/10910085372