epi2me-labs / wf-transcriptomes

Other
74 stars 32 forks source link

Error executing process > 'pipeline:makeReport (1) error exit status (137) #116

Open afazhra opened 1 month ago

afazhra commented 1 month ago

I’m experiencing Error executing process > 'pipeline:makeReport (1)' this same issue across multiple versions. I’ve tried various versions, starting from the lowest to the newest wf-transcriptomes, but the problem persists. My system setup includes 62 GB of memory and 32 cores, running on Ubuntu 20.04.6 LTS. The file I’m attempting to upload is a 28.7 GB FASTQ file and i'm using single thread for the process.

Any suggestions or guidance on resolving this would be greatly appreciated.

Thank you

`Workflow execution completed unsuccessfully! The exit status of the task that caused the workflow execution to fail was: 137.

The full error message was:

Error executing process > 'pipeline:makeReport (1)'

Caused by: Process pipeline:makeReport (1) terminated with an error exit status (137)

Command executed:

if [ -f "de_report/OPTIONAL_FILE" ]; then dereport="" else dereport="--de_report true --de_stats "seqkit/"" mv de_report/.gf de_report/stringtie_merged.gtf fi if [ -f "gff_annotation/OPTIONAL_FILE" ]; then OPT_GFF_ANNOTATION="" else OPT_GFF_ANNOTATION="--gff_annotation gff_annotation/" fi if [ -f "gffcmp_dir/OPTIONAL_FILE" ]; then OPT_GFFCMP_DIR="" else OPT_GFFCMP_DIR="--gffcompare_dir gffcmp_dir/" fi if [ -f "aln_stats/OPTIONAL_FILE" ]; then OPT_ALN="" else OPT_ALN="--alignment_stats aln_stats/" fi if [ -f "pychopper_report/OPTIONAL_FILE" ]; then OPT_PC_REPORT="" else OPT_PC_REPORT="--pychop_report pychopper_report/" fi if [ -f "isoforms_table/OPTIONAL_FILE" ]; then OPT_ISO_TABLE="" else OPT_ISO_TABLE="--isoform_table isoforms_table" fi workflow-glue report --report wf-transcriptomes-report.html --versions versions.txt --params params.json ${OPT_ALN} ${OPT_PC_REPORT} --stats per_read_stats/ ${OPT_GFF_ANNOTATION} ${OPT_ISO_TABLE} ${OPT_GFFCMP_DIR} --isoform_table_nrows 5000 ${dereport}

Command exit status: 137

Command output: (empty)

Command error: [03:13:28 - workflow_glue] Bootstrapping CLI. /opt/custflow/epi2meuser/conda/lib/python3.8/site-packages/gffutils/parser.py:19: DeprecationWarning: invalid escape sequence \w gff3_kw_pat = re.compile('\w+=') [03:13:30 - workflow_glue] Starting entrypoint. .command.sh: line 33: 30 Killed workflow-glue report --report wf-transcriptomes-report.html --versions versions.txt --params params.json ${OPT_ALN} ${OPT_PC_REPORT} --stats per_read_stats/* ${OPT_GFF_ANNOTATION} ${OPT_ISO_TABLE} ${OPT_GFFCMP_DIR} --isoform_table_nrows 5000 ${dereport}

Work dir: /home/p2solo/epi2melabs/instances/wf-transcriptomes_01J8KFT05TC72YM7QQVMA49A2T/work/f8/637931dcb6240f13fa6728696fc20f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run``

nrhorner commented 1 month ago

Hi @afazhra

I'm sorry you're experiencing issues with the workflow. There will be an update shortly that should fix your issue. In the meantime could you try the following:

1) create a file with the following contents called memory.config (for example)

process {
     withName:makeReport {
        memory = 16.GB
     }
 }

2) Run you workflow with the following additions to the command. The (-resume allows the report to be made without running the whole workflow again).

-c memory.config -r resume

Thanks,

Neil

afazhra commented 1 week ago

thank you so much for your help, it solved the problem

afazhra commented 1 week ago

Hi @nrhorner sorry to bother you again. I’ve encountered an issue with the report below, even though I'm using 30 threads and the memory configuration you provided earlier,

thank you.

Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'pipeline:differential_expression:deAnalysis (1)'

Caused by:
  Process `pipeline:differential_expression:deAnalysis (1)` terminated with an error exit status (1)

Command executed:

  mkdir merged
  mkdir de_analysis
  de_analysis.R annotation.gtf 3 1 10 3 "sample_sheet.csv"

Command exit status:
  1

Command output:
  Loading counts, conditions and parameters.
  Checking annotation file type.
  Annotation file type is gtf.
  Checking annotation file for presence of transcript_id versions.
  Annotation file transcript_ids include versions.
  Loading annotation database.
  Filtering counts using DRIMSeq.
  Building model matrix.
  Sum transcript counts into gene counts.
  Running differential gene expression analysis using edgeR.
  Running differential transcript usage analysis using DEXSeq.

Command error:
  package 'DRIMSeq' was built under R version 4.3.2 
  Warning messages:
  1: package 'GenomicFeatures' was built under R version 4.3.2 
  2: package 'BiocGenerics' was built under R version 4.3.2 
  3: package 'S4Vectors' was built under R version 4.3.3 
  4: package 'IRanges' was built under R version 4.3.3 
  5: package 'GenomeInfoDb' was built under R version 4.3.2 
  6: package 'GenomicRanges' was built under R version 4.3.3 
  7: package 'AnnotationDbi' was built under R version 4.3.2 
  8: package 'Biobase' was built under R version 4.3.3 
  Warning messages:
  1: package 'edgeR' was built under R version 4.3.3 
  2: package 'limma' was built under R version 4.3.3 
  Loading counts, conditions and parameters.
  Checking annotation file type.
  Annotation file type is gtf.
  Checking annotation file for presence of transcript_id versions.
  Annotation file transcript_ids include versions.
  Loading annotation database.
  Import genomic features from the file as a GRanges object ... OK
  Prepare the 'metadata' data frame ... OK
  Make the TxDb object ... OK
  Warning message:
  In .get_cds_IDX(mcols0$type, mcols0$phase) :
    The "phase" metadata column contains non-NA values for features of type
    stop_codon. This information was ignored.
  'select()' returned 1:many mapping between keys and columns
  Filtering counts using DRIMSeq.
  Building model matrix.
  Sum transcript counts into gene counts.
  Warning message:
  package 'dplyr' was built under R version 4.3.3 
  Running differential gene expression analysis using edgeR.
  Running differential transcript usage analysis using DEXSeq.
  Warning messages:
  1: package 'DEXSeq' was built under R version 4.3.3 
  2: package 'BiocParallel' was built under R version 4.3.3 
  3: package 'SummarizedExperiment' was built under R version 4.3.2 
  4: package 'MatrixGenerics' was built under R version 4.3.3 
  5: package 'matrixStats' was built under R version 4.3.3 
  6: package 'DESeq2' was built under R version 4.3.3 
  7: package 'RColorBrewer' was built under R version 4.3.3 
  converting counts to integer mode
  Warning message:
  In DESeqDataSet(rse, design, ignoreRank = TRUE) :
    some variables in design formula are characters, converting to factors
  Error in estimateSizeFactorsForMatrix(featureCounts(object), locfunc,  : 
    every gene contains at least one zero, cannot compute log geometric means
  Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix
  Execution halted

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
sejmodha commented 18 hours ago

@nrhorner I have faced the exact same issue of makeReport process running out of RAM. I tried to apply the memory.config fix to cap RAM around 16 GB but that did not change anything.

Any other suggestions on how to fix this? I am running the workflow with -profile singularity setting in a server with 64Gb RAM.

Thanks for your help in advance. Sej