CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
52 stars 19 forks source link

[BUG] - calculate assembly ratio error #101

Closed erinyoung closed 1 year ago

erinyoung commented 1 year ago

Describe the bug

phoenix fails when calculating assembly error for sample 2022CK-00833

Impact

Error executing process > 'PHOENIX:PHOENIX_EXTERNAL:CALCULATE_ASSEMBLY_RATIO (2022CK-00833)'

Caused by:
  Missing output file(s) `*_GC_content_*.txt` expected by process `PHOENIX:PHOENIX_EXTERNAL:CALCULATE_ASSEMBLY_RATIO (2022CK-00833)`

Command executed:

  calculate_assembly_ratio.sh -d NCBI_Assembly_stats_20220928.txt -q 2022CK-00833_report.tsv -x 2022CK-00833.tax -s 2022CK-00833 

  cat <<-END_VERSIONS > versions.yml
  "PHOENIX:PHOENIX_EXTERNAL:CALCULATE_ASSEMBLY_RATIO":
      NCBI Assembly Stats DB: NCBI_Assembly_stats_20220928.txt
  END_VERSIONS

Command exit status:
  0

Command output:
  Option -d triggered, argument = NCBI_Assembly_stats_20220928.txt
  Option -q triggered, argument = 2022CK-00833_report.tsv
  Option -x triggered, argument = 2022CK-00833.tax
  Option -s triggered, argument = 2022CK-00833
  Checking if quast Assembly_stats exists: 2022CK-00833_report.tsv
  Checking if Tax summary exists: 2022CK-00833.tax
  No expected length was found to compare to

Command error:
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_NXF_DEBUG will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  sed: preserving permissions for './sedzif6jd': Operation not permitted
  sed: preserving permissions for './sedABGP6L': Operation not permitted

Work dir:
  /Volumes/IDGenomics_NAS/testing_phoenix/2023-03-07/work/09/a0fd5e4149ac42e970b007b3b456ac

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

To Reproduce The fastq files for this isolate can found on the SRA with accession SRR23080371

Expected behavior No error?

Screenshots Error is pasted above.

Logs If applicable, please attach logs to help describe your problem. These would be the .command.err, .command.out, .command.sh and/or the .nextflow.log files associated with the run/process that failed. The .nextflow.log file is in directory you ran the pipeline from. The .command.XXX files are found in <directory you ran pipeline from>/work/XX/xxxxxxxxxxxxxx here the x and X are a random string of letters/numbers associated with the process. As the pipeline runs on CLI you will see the beginning of these strings to the left of the process that is running.

I have attached the files here. I have changed their names so that they can be uploaded in github issues (i.e. no files starting with '.' and I changed the extension). command.log.txt command.out.txt command.run.txt command.sh.txt command.trace.txt command.begin.txt command.err.txt

Additional context Add any other context about the problem here. If you have done any internet sleuthing already please link to any relevant posts on the topic.

nvlachos commented 1 year ago

Hey @erinyoung! Thanks for reporting this issue. This is happening due to there not being a reference for Paenibacillus urinalis in the assembly ratio database. We knew this was inevitably going to happen, but thought it was guarded against. We'll work on getting the pipeline to properly react in this situation. This specific isolate also brought up a new scenario where NOTHING in the ANI database passes the reporting thresholds. We'll have to dig into this one after clearing up the first issue.

nvlachos commented 1 year ago

Closing this ticket. The issue was resolved by updating the sketch file database used for ANI. Other improvements to handling low/missing species have been added to the v2.1.0-dev branch and will be available when that branch becomes a stable release.