CDCgov / phoenix

πŸ”₯🐦πŸ”₯PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
56 stars 19 forks source link

Missing output file(s) *_best_MASH_hits.txt #49

Closed wchen190 closed 2 years ago

wchen190 commented 2 years ago

The following error was encountered while running the latest version.

Steps to reproduce the behavior:

~/sandbox$ nextflow run /mnt/WGSData/phoenix/main.nf -profile docker -entry PHOENIX --input samplesheet.csv --kraken2db /mnt/WGSData/phoenix/assets/databases/ N E X T F L O W ~ version 22.04.3 Launching /mnt/WGSData/phoenix/main.nf [big_cajal] DSL2 - revision: ef1e7f30a2



| \ | | __ |\ | | \ / |/ || / \ | | \ | | \/ | | | __/ |_ | | |__ / \

cdcgov/phoenix v1.0.0

Core Nextflow options runName : big_cajal containerEngine : docker launchDir : /home/EADS/wchen/sandbox workDir : /home/EADS/wchen/sandbox/work projectDir : /mnt/WGSData/phoenix userName : wchen profile : docker configFiles : /mnt/WGSData/phoenix/nextflow.config

Required Options input : samplesheet.csv kraken2db : /mnt/WGSData/phoenix/assets/databases/

Optional options outdir : /home/EADS/wchen/sandbox/results busco_db_path : null

Institutional config options custom_config_version: master

!! Only displaying parameters that differ from the pipeline defaults !!

If you use cdcgov/phoenix for your analysis please cite:

Caused by: Missing output file(s) *_best_MASH_hits.txt expected by process PHOENIX:PHOENIX_EXTERNAL:DETERMINE_TOP_TAXA (VS220805-1287-220805-M70822-SC2_S1)

Command executed:

sort_and_prep_dist.sh -a VS220805-1287-220805-M70822-SC2_S1.filtered.scaffolds.fa.gz -x VS220805-1287-220805-M70822-SC2_S1.txt -d ./

Command exit status: 0

Command output: Option -a triggered, argument = VS220805-1287-220805-M70822-SC2_S1.filtered.scaffolds.fa.gz Option -x triggered, argument = VS220805-1287-220805-M70822-SC2_S1.txt Option -d triggered, argument = ./ Cutoff IS: 1 'Catharanthus_roseus'_GCF_004214875.1_ASM421487v1_genomic.fna VS220805-1287-220805-M70822-SC2_S1.filtered.scaffolds.fa.gz 1 1 0/1000 dist-1 - 'Catharanthus_roseus'_GCF_004214875.1_ASM421487v1_genomic.fna

Work dir: /home/EADS/wchen/sandbox/work/c9/8549b0d52907110a0d0889b13e91c6

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Logs Log files are attached. The .command.err file is empty. .nextflow.log .command.log samplesheet.csv

jvhagey commented 2 years ago

Hi @wchen190, so in the .command.log file it says the organism is "Catharanthus_roseus" which is a plant and is the root of the issue (pun intended). What did you expect this organism to be? Also, in that same file it has 0/1000 at the end of the line and that is saying that there was no kmer matches anywhere. Per @nvlachos this is likely from 2 causes a REALLY bad assembly or an organism outside the bacterial world. However, we use all of bacterial genomes in refseq so it should have something that has at least 1 kmer. Along the same lines @Alyssa-Kent has had issues with ANI in the past because the assembly is so poor. You should have a folder for that sample that has the files for the processes that were able to finish. Can you have a look/provide the following files:

Getting a look at these can help us sort out, what went wrong with the assembly and depending on how they look we might want to put some checks in place to flag something like this. @nvlachos and @Alyssa-Kent might have other files they want to see as well to make a determination on how to proceed.

wchen190 commented 2 years ago

Thanks Jill, see attached for the requested files. I put them in a zip file since I can't upload the html file. files.zip

jvhagey commented 2 years ago

@wchen190, per the kraken output this looks like a SARS-CoV-2 sample. SC2 is in the file name and the assembled genome seems close to 33kbp, which is what is expected for SC2 (SC2=29.9kbp) so I think there was just a mix up in what you thought this sample was. If you can confirm this I will close this issue as it doesn't seem to be a bug with the software in this case.

wchen190 commented 2 years ago

@jvhagey, you're right, they're SARS-CoV-2 samples. Sorry for the mix up. I just tested with another set of samples and it completed successfully, thanks Jill!