CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
52 stars 19 forks source link

[BUG] - CUSTOM_DUMPSOFTWAREVERSIONS #79

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

Describe the bug CUSTOM_DUMPSOFTWAREVERSIONS is producing a Command exit status: 1 error.

Impact Pipeline is failing when attempting to run this process.

To Reproduce Running in an AWS instance (Linux-based).

I'm attempting to run the test samples using the following command:

./nextflow run cdcgov/phoenix -r v1.0.0 -profile docker,test -entry PHOENIX --kraken2db /home/ubuntu/phoenix/

Problem Workflow process

executor >  local (40)
[f7/5f303a] process > PHOENIX:PHOENIX_EXTERNAL:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                  [100%] 1 of 1 ✔
[b3/3ee68f] process > PHOENIX:PHOENIX_EXTERNAL:ASSET_CHECK                                                      [100%] 1 of 1 ✔
[c3/c40f81] process > PHOENIX:PHOENIX_EXTERNAL:BBDUK (Test_Sample)                                              [100%] 1 of 1 ✔
[a2/fa4a98] process > PHOENIX:PHOENIX_EXTERNAL:FASTP_TRIMD (Test_Sample)                                        [100%] 1 of 1 ✔
[51/8a4a20] process > PHOENIX:PHOENIX_EXTERNAL:FASTP_SINGLES (Test_Sample)                                      [100%] 1 of 1 ✔
[47/cb204b] process > PHOENIX:PHOENIX_EXTERNAL:GATHERING_READ_QC_STATS (Test_Sample)                            [100%] 1 of 1 ✔
[14/eaab2a] process > PHOENIX:PHOENIX_EXTERNAL:FASTQCTRIMD (Test_Sample)                                        [100%] 1 of 1 ✔
[6a/f4168c] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KRAKEN2_TRIMD (Test_Sample)                        [100%] 1 of 1 ✔
[7c/ed3737] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KREPORT2MPA_TRIMD (Test_Sample)                    [100%] 1 of 1 ✔
[52/f867cd] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KREPORT2KRONA_TRIMD (Test_Sample)                  [100%] 1 of 1 ✔
[3c/5860e1] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KRONA_KTIMPORTTEXT_TRIMD (Test_Sample)             [100%] 1 of 1 ✔
[03/590836] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KRAKEN2_BH_TRIMD (Test_Sample)                     [100%] 1 of 1 ✔
[3c/aa77ed] process > PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:SPADES (Test_Sample)                                   [100%] 1 of 1 ✔
[-        ] process > PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:DETERMINE_TAXA_ID_FAILURE                              -
[-        ] process > PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:GENERATE_PIPELINE_STATS_FAILURE                        -
[-        ] process > PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:CREATE_SUMMARY_LINE_FAILURE                            -
[53/7b7568] process > PHOENIX:PHOENIX_EXTERNAL:RENAME_FASTA_HEADERS (Test_Sample)                               [100%] 1 of 1 ✔
[8f/df75b2] process > PHOENIX:PHOENIX_EXTERNAL:BBMAP_REFORMAT (Test_Sample)                                     [100%] 1 of 1 ✔
[af/b4cd75] process > PHOENIX:PHOENIX_EXTERNAL:MLST (Test_Sample)                                               [100%] 1 of 1 ✔
[da/60c842] process > PHOENIX:PHOENIX_EXTERNAL:GAMMA_HV (Test_Sample)                                           [100%] 1 of 1 ✔
[09/28f8de] process > PHOENIX:PHOENIX_EXTERNAL:GAMMA_AR (Test_Sample)                                           [100%] 1 of 1 ✔
[09/cbbcdf] process > PHOENIX:PHOENIX_EXTERNAL:GAMMA_PF (Test_Sample)                                           [100%] 1 of 1 ✔
[8c/7a09d5] process > PHOENIX:PHOENIX_EXTERNAL:QUAST (Test_Sample)                                              [100%] 1 of 1 ✔
[15/c12f81] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRAKEN2_WTASMBLD (Test_Sample)                  [100%] 1 of 1 ✔
[0c/77a837] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRAKENTOOLS_MAKEKREPORT (Test_Sample)           [100%] 1 of 1 ✔
[6e/cc21f0] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KREPORT2KRONA_WTASMBLD (Test_Sample)            [100%] 1 of 1 ✔
[45/0b5705] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRAKEN2_BH_WTASMBLD (Test_Sample)               [100%] 1 of 1 ✔
[fc/6768af] process > PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRONA_KTIMPORTTEXT_WTASMBLD (Test_Sample)       [100%] 1 of 1 ✔
[fe/16c31e] process > PHOENIX:PHOENIX_EXTERNAL:MASH_DIST (Test_Sample)                                          [100%] 1 of 1 ✔
[73/cf1707] process > PHOENIX:PHOENIX_EXTERNAL:DETERMINE_TOP_TAXA (Test_Sample)                                 [100%] 1 of 1 ✔
[66/37d3dc] process > PHOENIX:PHOENIX_EXTERNAL:FASTANI (Test_Sample)                                            [100%] 1 of 1 ✔
[1b/143508] process > PHOENIX:PHOENIX_EXTERNAL:FORMAT_ANI (Test_Sample)                                         [100%] 1 of 1 ✔
[b9/9ab945] process > PHOENIX:PHOENIX_EXTERNAL:DETERMINE_TAXA_ID (Test_Sample)                                  [100%] 1 of 1 ✔
[db/ca9eb5] process > PHOENIX:PHOENIX_EXTERNAL:PROKKA (Test_Sample)                                             [100%] 1 of 1 ✔
[de/606b4f] process > PHOENIX:PHOENIX_EXTERNAL:AMRFINDERPLUS_UPDATE (update)                                    [100%] 1 of 1 ✔
[de/ceec98] process > PHOENIX:PHOENIX_EXTERNAL:GET_TAXA_FOR_AMRFINDER (Test_Sample)                             [100%] 1 of 1 ✔
[c6/064831] process > PHOENIX:PHOENIX_EXTERNAL:AMRFINDERPLUS_RUN (Test_Sample)                                  [100%] 1 of 1 ✔
[97/ec86f8] process > PHOENIX:PHOENIX_EXTERNAL:CALCULATE_ASSEMBLY_RATIO (Test_Sample)                           [100%] 1 of 1 ✔
[ce/a12c32] process > PHOENIX:PHOENIX_EXTERNAL:GENERATE_PIPELINE_STATS_WF:GENERATE_PIPELINE_STATS (Test_Sample) [100%] 1 of 1 ✔
[12/e8d51b] process > PHOENIX:PHOENIX_EXTERNAL:CREATE_SUMMARY_LINE (Test_Sample)                                [100%] 1 of 1 ✔
[a9/3264d5] process > PHOENIX:PHOENIX_EXTERNAL:FETCH_FAILED_SUMMARIES                                           [100%] 1 of 1 ✔
[5e/fd137e] process > PHOENIX:PHOENIX_EXTERNAL:GATHER_SUMMARY_LINES (1)                                         [100%] 1 of 1 ✔
[32/2489a0] process > PHOENIX:PHOENIX_EXTERNAL:CUSTOM_DUMPSOFTWAREVERSIONS (1)                                  [100%] 1 of 1, failed: 1 ✘
[-        ] process > PHOENIX:PHOENIX_EXTERNAL:MULTIQC                                                          [  0%] 0 of 1

Error details below:

Execution cancelled -- Finishing pending tasks before exit
-[cdcgov/phoenix] Pipeline completed with errors-
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Error executing process > 'PHOENIX:PHOENIX_EXTERNAL:CUSTOM_DUMPSOFTWAREVERSIONS (1)'

Caused by:
  Process `PHOENIX:PHOENIX_EXTERNAL:CUSTOM_DUMPSOFTWAREVERSIONS (1)` terminated with an error exit status (1)

Command executed [/home/ubuntu/.nextflow/assets/cdcgov/phoenix/./workflows/../modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py]:

  #!/usr/bin/env python

  import yaml
  import platform
  from textwrap import dedent

  def _make_versions_html(versions):
      html = [
          dedent(
              """\
              <style>
              #nf-core-versions tbody:nth-child(even) {
                  background-color: #f2f2f2;
              }
              </style>
              <table class="table" style="width:100%" id="nf-core-versions">
                  <thead>
                      <tr>
                          <th> Process Name </th>
                          <th> Software </th>
                          <th> Version  </th>
                      </tr>
                  </thead>
              """
          )
      ]
      for process, tmp_versions in sorted(versions.items()):
          html.append("<tbody>")
          for i, (tool, version) in enumerate(sorted(tmp_versions.items())):
              html.append(
                  dedent(
                      f"""\
                      <tr>
                          <td><samp>{process if (i == 0) else ''}</samp></td>
                          <td><samp>{tool}</samp></td>
                          <td><samp>{version}</samp></td>
                      </tr>
                      """
                  )
              )
          html.append("</tbody>")
      html.append("</table>")
      return "\n".join(html)

  versions_this_module = {}
  versions_this_module["PHOENIX:PHOENIX_EXTERNAL:CUSTOM_DUMPSOFTWAREVERSIONS"] = {
      "python": platform.python_version(),
      "yaml": yaml.__version__,
  }

  with open("collated_versions.yml") as f:
      #versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module
      versions_by_process = yaml.load(f, Loader=yaml.BaseLoader)
      versions_by_process= {**versions_by_process, **versions_this_module}

  # aggregate versions by the module name (derived from fully-qualified process name)
  versions_by_module = {}
  for process, process_versions in versions_by_process.items():
      module = process.split(":")[-1]
      try:
          assert versions_by_module[module] == process_versions, (
              "We assume that software versions are the same between all modules. "
              "If you see this error-message it means you discovered an edge-case "
              "and should open an issue in nf-core/tools. "
          )
      except KeyError:
          versions_by_module[module] = process_versions

  versions_by_module["Workflow"] = {
      "Nextflow": "22.10.4",
      "cdcgov/phoenix": "1.0.0",
  }

  versions_mqc = {
      "id": "software_versions",
      "section_name": "cdcgov/phoenix Software Versions",
      "section_href": https://github.com/cdcgov/phoenix,
      "plot_type": "html",
      "description": "are collected at run time from the software output.",
      "data": _make_versions_html(versions_by_module),
  }

  with open("software_versions.yml", "w") as f:
      yaml.dump(versions_by_module, f, default_flow_style=False)
  with open("software_versions_mqc.yml", "w") as f:
      yaml.dump(versions_mqc, f, default_flow_style=False)

  with open("versions.yml", "w") as f:
      yaml.dump(versions_this_module, f, default_flow_style=False)

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File ".command.sh", line 55, in <module>
      versions_by_process = yaml.load(f, Loader=yaml.BaseLoader)
    File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 114, in load
      return loader.get_single_data()
    File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 49, in get_single_data
      node = self.get_single_node()
    File "/usr/local/lib/python3.9/site-packages/yaml/composer.py", line 36, in get_single_node
      document = self.compose_document()
    File "/usr/local/lib/python3.9/site-packages/yaml/composer.py", line 55, in compose_document
      node = self.compose_node(None, None)
    File "/usr/local/lib/python3.9/site-packages/yaml/composer.py", line 84, in compose_node
      node = self.compose_mapping_node(anchor)
    File "/usr/local/lib/python3.9/site-packages/yaml/composer.py", line 127, in compose_mapping_node
      while not self.check_event(MappingEndEvent):
    File "/usr/local/lib/python3.9/site-packages/yaml/parser.py", line 98, in check_event
      self.current_event = self.state()
    File "/usr/local/lib/python3.9/site-packages/yaml/parser.py", line 438, in parse_block_mapping_key
      raise ParserError("while parsing a block mapping", self.marks[-1],
  yaml.parser.ParserError: while parsing a block mapping
    in "collated_versions.yml", line 1, column 1
  expected <block end>, but found '<block mapping start>'
    in "collated_versions.yml", line 16, column 4

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Adding output of collated_versions.yml below:

uf$ cat collated_versions.yml
"PHOENIX:PHOENIX_EXTERNAL:QUAST":
    quast: 5.0.2
"PHOENIX:PHOENIX_EXTERNAL:DETERMINE_TAXA_ID":
    NCBI Taxonomy Reference File: taxes_20220315.csv
"PHOENIX:PHOENIX_EXTERNAL:AMRFINDERPLUS_RUN":
    amrfinderplus: 3.10.40
"PHOENIX:PHOENIX_EXTERNAL:CREATE_SUMMARY_LINE":
    python: 3.10.6
"PHOENIX:PHOENIX_EXTERNAL:RENAME_FASTA_HEADERS":
    python: 3.10.6
"PHOENIX:PHOENIX_EXTERNAL:CALCULATE_ASSEMBLY_RATIO":
    NCBI Assembly Stats DB: NCBI_Assembly_stats_20220928.txt
"PHOENIX:PHOENIX_EXTERNAL:AMRFINDERPLUS_UPDATE":
    amrfinderplus: 3.10.40
    amrfinderplus_db_version: 2022-10-11.2
   "PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRAKEN2_WTASMBLD":
       kraken2: 2.1.2
       kraken2db: phoenix
   END_VERSIONS
"PHOENIX:PHOENIX_EXTERNAL:MASH_DIST":
    mash: 2.3
    Mash Sketch: REFSEQ_20220915_Bacteria_complete.msh
"PHOENIX:PHOENIX_EXTERNAL:FASTANI":
    fastani:  1.33
"PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRONA_KTIMPORTTEXT_WTASMBLD":
    krona: 2.8.1
"PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_WTASMBLD:KRAKENTOOLS_MAKEKREPORT":
    python: 3.10.6
    krakentools_makekreport: 1.2
"PHOENIX:PHOENIX_EXTERNAL:INPUT_CHECK:SAMPLESHEET_CHECK":
    python: 3.10.6
   "PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KRAKEN2_TRIMD":
       kraken2: 2.1.2
       kraken2db: phoenix
   END_VERSIONS
"PHOENIX:PHOENIX_EXTERNAL:GATHER_SUMMARY_LINES":
    python: 3.10.6
"PHOENIX:PHOENIX_EXTERNAL:GAMMA_PF":
    gamma: 2.1
    Database: PF-Replicons_20220916.fasta
"PHOENIX:PHOENIX_EXTERNAL:BBMAP_REFORMAT":
    bbmap: 38.96
"PHOENIX:PHOENIX_EXTERNAL:GAMMA_HV":
    gamma: 2.1
    Database: HyperVirulence_20220414.fasta
"PHOENIX:PHOENIX_EXTERNAL:BBDUK":
    bbmap: 38.96
"PHOENIX:PHOENIX_EXTERNAL:PROKKA":
    prokka: 1.14.5
"PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KRONA_KTIMPORTTEXT_TRIMD":
    krona: 2.8.1
"PHOENIX:PHOENIX_EXTERNAL:GAMMA_AR":
    gamma: 2.1
    Database: ResGANNCBI_20220915_srst2.fasta
"PHOENIX:PHOENIX_EXTERNAL:FASTQCTRIMD":
    fastqc: 0.11.9
"PHOENIX:PHOENIX_EXTERNAL:MLST":
    mlst: 2.22.1
"PHOENIX:PHOENIX_EXTERNAL:FASTP_SINGLES":
    fastp: 0.23.2
"PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KREPORT2MPA_TRIMD":
    python: 3.10.6
    krakentools: 1.2
"PHOENIX:PHOENIX_EXTERNAL:FASTP_TRIMD":
    fastp: 0.23.2
"PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:SPADES":
    spades: 3.15.5

Please let me know what other information you might need!

jvhagey commented 1 year ago

@slsevilla I and @nvlachos have seen this before. I believe the error is coming from the kraken step. The END_VERSIONS and weird spacing for the kraken step shouldn't be in the collated_versions.yml file. I don't know why that is happening though as I can't recreate the error (the only differences are I have a different database location obviously). There isn't some weird spacing or something that is being added to the end of the database argument you pass is there?

slsevilla commented 1 year ago

Hi - This is the exact command I'm running:

./nextflow run cdcgov/phoenix -r v1.0.0 -profile docker,test -entry PHOENIX --kraken2db /home/ubuntu/phoenix/

So to your question, no. There's no weird spacing I'm adding... just the absolute path of the phoenix db.

slsevilla commented 1 year ago

I'm finding the error is persistent even with trying a new dataset. I am attempting with a run now to bypass the issue by commenting out the requirement for the this version.yaml to be added to the final report, but I'm not sure if this will fix the problem.

from: ~/.nextflow/assets/cdcgov/phoenix/workflows/cdc_phoenix.nf

    KRAKEN2_TRIMD (
        FASTP_TRIMD.out.reads, "trimd", GATHERING_READ_QC_STATS.out.fastp_total_qc, []
    )
    // ch_versions = ch_versions.mix(KRAKEN2_TRIMD.out.versions)

The error is from the versions.yaml file being created, although I can't quite figure out why this is causing the issue here.

$ cat versions.yml
   "PHOENIX:PHOENIX_EXTERNAL:KRAKEN2_TRIMD:KRAKEN2_TRIMD":
       kraken2: 2.1.2
       kraken2db: kraken2db
   END_VERSIONS

I'd appreciate any additional thoughts you might have.

slsevilla commented 1 year ago

For anyone else who encounters this issue, while I don't understand why it's happening, but I do have a solution:

Edit the following file:

.nextflow/assets/cdcgov/phoenix/modules/local/kraken2.nf

Change the versions.yaml command to the following:

echo -e "${task.process}:\n    kraken2: \$(echo \$(kraken2 --version 2>&1) | sed 's/^.*Kraken version //; s/ .*\$//') \n    kraken2db: $db" > versions.yml

This fixed the formatting issue for me, and allowed me to move to completion. Since it seems to be an issue I'm only having, and hasn't been reproducible for you all, I'm closing the issue.