Error 137 (out of memory?) during report generation

osilander commented 8 months ago

Operating System

CentOS 7

Other Linux

No response

Workflow Version

v1.1.0-g137d59e

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

# Also used singularity (deprecated on this cluster) in addition to apptainer
# slurm and apptainer profiles are in the nextflow.config
# pipeline works with this new config on other samples
# slurm params
process.executor = 'slurm'
process.memory = '16 GB'
process.time = '24:00:00'
process.cpus = 4  // Number of threads per task
task.cpus = 4    // Number of CPUs allocated by SLURM for each task
task.max = 128   
nextflow run epi2me-labs/wf-16s -profile apptainer,slurm -c nextflow.config -resume > output.log 2> error.log

Workflow Execution - CLI Execution Profile

custom

What happened?

No report generated despite abundance tables being generated. Ran with only BC01 - BC 19 only (as opposed to all 96 barcodes), report generated without issue.

Relevant log output

executor >  slurm (1)
[c6/fb14d8] process > fastcat (96)                                   [100%] 96 of 96, cached: 96 ✔
[skipped  ] process > prepare_databases:download_unpack_taxonomy     [100%] 1 of 1, stored: 1 ✔
[skipped  ] process > prepare_databases:download_reference_ref2taxid [100%] 1 of 1, stored: 1 ✔
[91/2f0383] process > minimap_pipeline:run_common:getVersions        [100%] 1 of 1, cached: 1 ✔
[0a/5ce371] process > minimap_pipeline:run_common:getParams          [100%] 1 of 1, cached: 1 ✔
[31/3a9da6] process > minimap_pipeline:minimap (barcode47)           [100%] 96 of 96, cached: 96 ✔
[63/425cee] process > minimap_pipeline:createAbundanceTables         [100%] 1 of 1, cached: 1 ✔
[36/5f1e33] process > minimap_pipeline:makeReport (1)                [100%] 1 of 1, failed: 1 ✘
[2d/b03a65] process > minimap_pipeline:output_results (3)            [100%] 3 of 3, cached: 3 ✔
ERROR ~ Error executing process > 'minimap_pipeline:makeReport (1)'

Caused by:
  Process `minimap_pipeline:makeReport (1)` terminated with an error exit status (137)

Command executed:

  workflow-glue report         "wf-16s-report.html"         --workflow_name wf-16s         --versions versions         --params params.json         --read_stats read_stats/*         --lineages lineages         --abundance_table "abundance_table_genus.tsv"         --taxonomic_rank "G"         --pipeline "minimap2"         --abundance_threshold "1"        --n_taxa_barplot "12"

Command exit status:
  137

Command output:
  (empty)

Command error:
  [00:31:11 - matplotlib] Matplotlib created a temporary cache directory at /dev/shm/jobs/44142076/matplotlib-l793tp2d because the default path (/home/osilande/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  [00:31:11 - matplotlib.font_manager] generated new fontManager
  /home/osilande/.nextflow/assets/epi2me-labs/wf-16s/bin/workflow_glue/__init__.py:30: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
    logger.warn(f"Could not load {name} due to missing module {e.name}")
  [00:31:12 - workflow_glue] Could not load abundance_tables due to missing module anytree
  [00:31:12 - workflow_glue] Starting entrypoint.
  .command.sh: line 2:    43 Killed                  workflow-glue report "wf-16s-report.html" --workflow_name wf-16s --versions versions --params params.json --read_stats read_stats/* --lineages lineages --abundance_table "abundance_table_genus.tsv" --taxonomic_rank "G" --pipeline "minimap2" --abundance_threshold "1" --n_taxa_barplot "12"

Work dir:
  /scale_wlg_nobackup/filesets/nobackup/uoa03387/AG1270/work/36/5f1e33f646a88960925467112acc7b



### Application activity log entry

_No response_

### Were you able to successfully run the latest version of the workflow with the demo data?

yes

### Other demo data information

_No response_

nggvs commented 8 months ago

Hi @osilander , Thank you very much for opening the issue. As I mentioned in the other one, the 137 error normally is due to not having enough memory. At this moment, each process has default values, if you're running samples that are bigger than we had expected, this could fail. You can override these values by providing a custom nextflow config (using -c ) assigning more memory to the process that is failing. For example:

nextflow_custom.config

process {
  withName: makeReport{
    memory = 8.GB 
  }
}

Let me know if this fixes your problem. Thank you very much!

osilander commented 8 months ago

resolved :)

nggvs commented 8 months ago

Glad to hear that! Please open a new issue if you detect something else. Thank you very much for using the workflow!

epi2me-labs / wf-16s