al2na / methylKit

R package for DNA methylation analysis
https://bioconductor.org/packages/release/bioc/html/methylKit.html
204 stars 96 forks source link

segfault when uniting large methylrawlistDB object #269

Open ethan-shealy opened 1 year ago

ethan-shealy commented 1 year ago

Hi, I am having an issue where a my analysis is proceeding normally, then a crash happens which terminates the process. I am working with a large methylation dataset from a whole-genome bisulfite experiment, so have something like 10 million CpGs over 40 samples. The first "unite" function I tried worked fine,

methOBJ <- methRead(location = fileLOC.list,
                     sample.id = as.list(file.list),
                     assembly = "AnoSag2.1", 
                     context = "CpG",
                     mincov = 5,
                     treatment = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))  
Received list of locations.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.

 methOBJ <- filterByCoverage(methOBJ,lo.count = 5,lo.perc = NULL,
                             hi.count = NULL,hi.perc = 99.9)

 methOBJ <- normalizeCoverage(methOBJ)

 methFrame.total <- unite(methOBJ, destrand = TRUE, mc.cores = 1, min.per.group = 30L,
                          save.db = TRUE, suffix = "total_30L_", dbdir = getwd())
destranding...
uniting...
creating directory /scratch/es88065/anolis_methylseq/methylkit/methylDB_2022-10-20_f40
compressing the file with bgzip...
making tabix index...
flatfile located at: /scratch/es88065/anolis_methylseq/methylkit/methylDB_2022-10-20_f40/methylBase_total_30L_.txt.bgz

mat.methFrame.total <- percMethylation(methFrame.total)

 write.table(mat.methFrame.total, file="/scratch/es88065/anolis_methylseq/methylkit/total_mat_30L.data", quote = F, sep = "\t", row.names = F)

 methFrame.total.raw <- as(methFrame.total, "methylBase")
Uncompressing file.
Reading file.

 write.table(methFrame.total.raw, file="/scratch/es88065/anolis_methylseq/methylkit/total_meth_30L.data", quote = F, sep = "\t", row.names = F)

 rm("methFrame.total", "mat.methFrame.total", "methFrame.total.raw")

but since I want to utilize two different filtering steps I used the 'reorganize' function and tried to unite again. Upon doing this, I received this recursive error and the process crashed.

methOBJ.sex.age <- reorganize(methOBJ,
                           sample.ids=as.list(file.list),
                           treatment= c(4,4,4,10,10,4,11,5,11,5,11,5,9,3,3,3,9,9,1,7,1,7,8,2,2,2,2,8,8,8,8,12,12,6,12,6,6))

 methFrame.sex.age <- unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,
                            save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
destranding...

 *** caught segfault ***
address 0x2ba042876020, cause 'memory not mapped'
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation

Traceback:
 1: paste(cpgF$chr, cpgF$start, cpgF$end)
 2: .CpG.dinuc.unify(df)
 3: unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,     save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
 4: unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,     save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
An irrecoverable exception occurred. R is aborting now ...

I only utilized ~50% of my allocated memory, so I do not think this is the issue. I previously ran into a segfault error using methylkit's "as" command to extract methylbase data to a matrix. I had to use methylation.matrix <- methylKit:::fread.gzipped(methFrame.total@dbpath, stringsAsFactors = FALSE, data.table = FALSE,skipDecompress=FALSE) to resolve that issue and avoid the segfault error. Is this issue arising from something similar? Thanks for your help.

alexg9010 commented 1 year ago

HI @ethan-shealy,

Thanks for reporting this issue. Unfortunately, these kinds of errors are rather hard to debug without the actual data at hand. Is this error actually reproducible and happens repeatedly? Would you be able to generate a minimal, reproducible example? Also, are you running the latest package versions of methylKit and data.table?

Related to the fix of your previous error, it seems that the use of data.table objects is causing these segfaults. From what I read online, one reoccurring solution was to limit the number of threads using setDTthreads. Maybe you could test if this fixes the problem?

Best, Alex

ethan-shealy commented 1 year ago

Hi @alexg9010, this error does occur consistently on the unite step, but unfortunately I wasn't able to create a feasible reproducible example not involving a huge data set. I have been running these large computations on my university's computing cluster, and it seems the only solution was bringing it to my local machine and letting it chug along for a while. Because of this, I'm guessing that the issue is due to whatever version of R, data.table, and methylKit that our cluster is running. Just for the reference of anyone finding this thread, here are the software versions. R version 4.2.1 methylKit version 1.22.0 data.table version 1.14.2 Running on CentOS Linux 7 kernel 3.10.0-1160.36.2.el7.x86_64