epi2me-labs / wf-somatic-variation

Other
12 stars 7 forks source link

Error executing process > 'mod:dss' #27

Open SilviaMariaMacri opened 3 months ago

SilviaMariaMacri commented 3 months ago

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux release 8.6

Workflow Version

v.1.2.1

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

/hpcshare/genomics/ASL_ONC/NextFlow_RunningDir/nextflow-23.10.0-all run epi2me-labs/wf-somatic-variation -profile singularity -resume -process.executor pbspro -process.memory 256.GB -work-dir /archive/s2/genomics/onco_nanopore/test_som_var/work -with-timeline --snv --sv --mod --sample_name OHU0002HI --bam_normal /archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002HTNDN/OHU0002HTNDN_dx0_dx-1_new.bam --bam_tumor /archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002ITTDN/OHU0002ITTDN_dx0_dx-1_new.bam --ref /archive/s1/sconsRequirements/databases/reference/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta --out_dir /archive/s2/genomics/onco_nanopore/test_som_var --basecaller_cfg dna_r10.4.1_e8.2_400bps_sup@v4.2.0 --phase_normal --classify_insert --force_strand --normal_min_coverage 0 --tumor_min_coverage 0 --haplotype_filter_threads 32 --severus_threads 32 --dss_threads 4 --modkit_threads 32 -process.cpus 32 -process.queue fatnodes

Workflow Execution - CLI Execution Profile

singularity

What happened?

Pipeline failed in its last step mod:dss.

During the issue replication (command "bash .command.run" in the working directory), as suggested by the error message, more information was shown:

System errno 22 unmapping file: Invalid argument Error in fread("normal.bed", sep = "\t", header = T) : Opened 15.96GB (17139453993 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available. Execution halted

Relevant log output

Error executing process > 'mod:dss (3)'

Caused by:
  Process `mod:dss (3)` terminated with an error exit status (137)

Command executed:

  #!/usr/bin/env Rscript
  library(DSS)
  require(bsseq)
  require(data.table)
  # Disable scientific notation
  options(scipen=999)

  # Import data
  tumor = fread("tumor.bed", sep = '    ', header = T)
  normal = fread("normal.bed", sep = '  ', header = T)
  # Create BSobject
  BSobj = makeBSseqData( list(tumor, normal),
      c("Tumor", "Normal") )
  # DML testing
  dmlTest = DMLtest(BSobj, 
      group1=c("Tumor"), 
      group2=c("Normal"),
      equal.disp = FALSE,
      smoothing=TRUE,
      smoothing.span=500,
      ncores=4)
  # Compute DMLs
  dmls = callDML(dmlTest,
      delta=0.25,
      p.threshold=0.001)
  # Compute DMRs
  dmrs = callDMR(dmlTest,
      delta=0.25,
      p.threshold=0.001,
      minlen=100,
      minCG=5,
      dis.merge=1500,
      pct.sig=0.5)
  # Write output files
  write.table(dmls, 'OHU0002HI.6mA_+.dml.tsv', sep='\t', quote=F, col.names=T, row.names=F)
  write.table(dmrs, 'OHU0002HI.6mA_+.dmr.tsv', sep='\t', quote=F, col.names=T, row.names=F)

Command exit status:
  137

Command output:
  (empty)

Command error:

      anyMissing, rowMedians

  Attaching package: 'MatrixGenerics'

  The following objects are masked from 'package:matrixStats':

      colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
      colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
      colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
      colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
      colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
      colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
      colWeightedMeans, colWeightedMedians, colWeightedSds,
      colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
      rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
      rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
      rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
      rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
      rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
      rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
      rowWeightedSds, rowWeightedVars

  The following object is masked from 'package:Biobase':

      rowMedians

  Loading required package: parallel
  Loading required package: data.table

  Attaching package: 'data.table'

  The following object is masked from 'package:SummarizedExperiment':

      shift

  The following object is masked from 'package:GenomicRanges':

      shift

  The following object is masked from 'package:IRanges':

      shift

  The following objects are masked from 'package:S4Vectors':

      first, second

  .command.run: line 164:    35 Killed                  /usr/bin/env Rscript .command.sh

Work dir:
  /archive/s2/genomics/onco_nanopore/test_som_var/work/20/a4581d28e28dd29ec5e3e0e78d757f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

no

Other demo data information

no attempt
RenzoTale88 commented 3 months ago

@SilviaMariaMacri the process is running out of memory (error 137). You can try reducing the number of threads for the DSS process to --dss_threads 2, which should reduce the amount of memory required.

SilviaMariaMacri commented 3 months ago

@RenzoTale88 thank you for your reply. I reduced --dss_threads firstly to 2 and then to 1 but it gave me the same error.

RenzoTale88 commented 3 months ago

Then you can try increasing the memory provided to the DSS process. Simply save the following block of code in a separate file:

process {
    withName: dss {
        memory = X.GB
    }
}

Where X is the amount of memory in GB that the process should use. Save the file as a custom configuration ending with .config and provide it to nextflow with the -c option:

nextflow run epi2me-labs/wf-somatic-variation -c < path to custom config file> < options here >
RenzoTale88 commented 3 months ago

@SilviaMariaMacri did you try providing a custom configuration file as mentioned above?

SilviaMariaMacri commented 3 months ago

@RenzoTale88 yes, but after setting the memory to 256 GB it kept giving me the same error. Then I manually modified the .command.run file by setting the memory to 380 GB and launched the job out of the pipeline; it seems to have successfully completed the job after almost 70 hours of running time. Now the pipeline is running with the new memory setting and I think it will finish without error since the single job did it. What do you think the reason of so long running time and this high memory use are? Can it be avoided?

RenzoTale88 commented 3 months ago

@SilviaMariaMacri it is quite difficult to say. The DSS process, as the name suggests, relies on the DSS R package to identify the differentially modified regions/loci. The impact on the memory is linked to the size of the dataset and the number of cores used for the analysis, which makes it difficult to predict for every use-case.

SilviaMariaMacri commented 2 months ago

Hi @RenzoTale88,

I'm using two whole genome sequencing bam files obtained with dorado and with double methylation (5mC_5hmC and 6mA). The bam file weight is 87G and 120G respectively for normal and tumor tissue. Six mod:dss processes are sent to pbs code, three of them successfully finish, the fourth one reaches the maximum time limit of 100 hours (each job setting consists of 1 cpu and 750GB of memory). So, by increasing the number of cpus I obtain memory error and by setting only 1 cpu I obtain time limit error.

Are there any plans to solve this problem by maybe dividing the input files into more than one file (i.e. one for each chromosome) and lauching the job separately for each file? Alternatively, do you have any suggestion to solve my case?

Thanks

RenzoTale88 commented 2 months ago

Hi @SilviaMariaMacri sorry to hear this is giving you issues. Do you have access to the logs of the processes failing (i.e. do you have access to the work directory)? That might help us figure out what is going wrong.

SilviaMariaMacri commented 2 months ago

Thank you for you answer @RenzoTale88 Yes, here are two log files (with exit status 143 and 130), but I can't get much information .command.log_exitcode130.log .command.log_exitcode143.log

RenzoTale88 commented 2 months ago

@SilviaMariaMacri thanks for sharing. I'll see if there is a way to reduce the memory usage of the process. I'll keep you updated on the process. Thanks in advance for your patience!

SilviaMariaMacri commented 1 month ago

Hi @RenzoTale88 do you have any update on the process? Thank you

RenzoTale88 commented 1 month ago

@SilviaMariaMacri sorry for the long silence. We have been running a number of tests, trying to figure out how to improve the situation, and are still working on a longer term solution for the memory issue. In the meanwhile, we released v1.3.1 that adds the option --diff_mod, that can disable DSS by setting it to false. This should allow the workflow to run to completion, and to emit the outputs that you can then analyse manually. We realise this is not a solution, and I apologise for the inconvenience.

Andrea