failing due to memory/"caught segment address (nil): cause 'unknown'"

Hello - I've been attempting to run exomepeak2 on hg38 full transcript for chr10 only, single samples provided for ip and input control vs treated however it's been failing consistently due to 1) memory limit issues and 2) now with increased memory it's failing with the error: " caught segfault address (nil), cause 'unknown'"

I'm using an HPC linux OS, requesting a single node, 20 threads and 128gb memory. I'm using a conda compiled R4.3 environment. Within R I'm increasing the memory limit to 100GB using the 'ulimit' package.

1) memory limit issue:

> library(exomePeak2)
> library(DESeq2)
> library(GenomicFeatures)
> library(BSgenome.Hsapiens.UCSC.hg38)
> gc()
           used  (Mb)  gc trigger     (Mb) max used  (Mb)
Ncells  9343582 499.1    19313185   1031.5 11348671 606.1
Vcells 16300605 124.4 13107200000 100000.0 28107829 214.5
> txdb <- makeTxDbFromGFF(
+     file = "/ifs/data/sz3145_gp/genomes/Homo_sapiens/UCSC/hg38/Annotation/gencode.v43.annotation.gtf",
+     format = "gtf",
+     dataSource = "GENCODE GTF",
+     organism = "Homo sapiens")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
> exomePeak2(
+     bam_ip = c(
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304491_dedup.REF_chr10.bam"
+         ),
+     bam_input = c(
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304485_dedup.REF_chr10.bam"
+         ),
+     bam_input_treated = c(   
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304487_dedup.REF_chr10.bam"
+         ),
+     bam_ip_treated = c(
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304493_dedup.REF_chr10.bam"
+         ),
+     txdb = txdb,
+     genome = "BSgenome.Hsapiens.UCSC.hg38",
+     mode = "full_transcript" 
+     )
Extract bin features ... OK
Count reads on bin features ... OK
Identify background features ... OK
Estimate sample sepecific size factors from the background ... OK
Calculate bin GC contents on exons ... Killed

2) memory increased, caught segment error

> devtools::install_github("krlmlr/ulimit")
> library(ulimit)
> ulimit::memory_limit(100000) 
 soft  hard
1e+05   Inf
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  924928 49.4    1812270 96.8  1328472 71.0
Vcells 2404583 18.4    8388608 64.0  5079774 38.8
> library(exomePeak2)
> library(DESeq2)
> library(GenomicFeatures)
> library(BSgenome.Hsapiens.UCSC.hg38)
> txdb <- makeTxDbFromGFF(
+     file = "/ifs/data/sz3145_gp/genomes/Homo_sapiens/UCSC/hg38/Annotation/gencode.v43.annotation.gtf",
+     format = "gtf",
+     dataSource = "GENCODE GTF",
+     organism = "Homo sapiens")
> exomePeak2(
+     bam_ip = c(
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304491_dedup.REF_chr10.bam"
+         ),
+     bam_input = c(
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304485_dedup.REF_chr10.bam"
+         ),
+     bam_input_treated = c(   
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304487_dedup.REF_chr10.bam"
+         ),
+     bam_ip_treated = c(
+         "/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304493_dedup.REF_chr10.bam"
+         ),
+     txdb = txdb,
+     genome = "BSgenome.Hsapiens.UCSC.hg38",
+     mode = "full_transcript" 
+     )
Extract bin features ... OK
Count reads on bin features ... OK
Identify background features ... OK
Estimate sample sepecific size factors from the background ... OK
Calculate bin GC contents on exons ... OK
Fit GC curves with smoothing splines ... OK
Calculate offset matrix for bins ... OK
Detect peaks with GLM ...
 *** caught segfault ***
address (nil), cause 'unknown' 

Traceback:
 1: t(exp(modelMatrix %*% t(betaRes$beta_mat)))
 2: fitNbinomGLMs(objectNZ, betaTol = betaTol, maxit = maxit, useOptim = useOptim,     useQR = useQR, renameCols = renameCols, modelMatrix = modelMatrix,     minmu = minmu)
 3: nbinomWaldTest(dds)
 4: callPeaks(se, txdb, test_method, p_cutoff, exByGene, bin_size,     motif_based)
 5: force(x)
 6: quiet(.)
 7: callPeaks(se, txdb, test_method, p_cutoff, exByGene, bin_size,     motif_based) %>% quiet
 8: diffAnalysis(bam_IP = bam_ip, bam_input = bam_input, bam_IP_treated = bam_ip_treated,     bam_input_treated = bam_input_treated, txdb = txdb, genome = genome,     bin_size = bin_size, step_size = step_size, fragment_length = fragment_length,     strandness = strandness, test_method = test_method, p_cutoff = p_cutoff,     plot_gc = plot_gc, parallel = parallel, motif_based = motif_based,     motif_sequence = "DRACH")
 9: exomePeak2(bam_ip = c("/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304491_dedup.REF_chr10.bam"),     bam_input = c("/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304485_dedup.REF_chr10.bam"),     bam_input_treated = c("/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304487_dedup.REF_chr10.bam"),     bam_ip_treated = c("/ifs/data/sz3145_gp/fy2306/projects/rotation/data/GSE207663/sambamba_default/unique/GSM6304493_dedup.REF_chr10.bam"),     txdb = txdb, genome = "BSgenome.Hsapiens.UCSC.hg38", mode = "full_transcript")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

As we would prefer to run exomepeak2 on the whole bam file, splitting into chromosomes was a workaround, however, this single chromosome is not running to completion. I've read that exomepeak2 is highly memory intensive, however, it says for large genomes that 4GB memory is needed...

We've successfully run exomepeak2 on exons only, but we need the full transcript data. Any help/guidance would be greatly appreciated!

ZW-xjtlu / exomePeak2

failing due to memory/"caught segment address (nil): cause 'unknown'" #40