CompEpigen / DecompPipeline

Large automated pipeline for running the MeDeCom
GNU General Public License v3.0
4 stars 6 forks source link

importing WGBS #7

Closed bazyliszek closed 4 years ago

bazyliszek commented 4 years ago

Nice development. I notice that WGBS data can be also used here for deconvolution method. rnb.set <- load.rnb.set(system.file("extdata/small_rnbSet.zip",package="DecompPipeline"))

Can other object like bs objects from dmrseq (https://github.com/kdkorthauer/dmrseq) be imported at this step? Would be nice to test performance of this method and compare to methylCC.

Cheers, Marcin

schmic05 commented 4 years ago

Hi Marcin, We currently do not support any other datatype than obtained from RnBeads. Thanks for pointing this out, we can work on also supporting this. We'll let you know if we have some further development on that. As a workaround, you could transform an dmrseq object to an RnBiseqSet object, which then can be used with our package. Best, Michael

bazyliszek commented 4 years ago

Thank you Michael, I will try transformation and will give it try. I will keep this open for few more days, in case there are problems with a structure. I am sure others would like to try this approach as well. Cheer, Marcin

lutsik commented 4 years ago

Good point, thanks. I would then not forget about our new package methrix:

https://github.com/CompEpigen/methrix

bazyliszek commented 4 years ago

So trying transformation:

pheno <- bs.filtered@colData sites <- bs.filtered@rowRanges meth <- bs.filtered@assays@data@listData$M/bs.filtered@assays@data@listData$Cov coverage <- bs.filtered@assays@data@listData$Cov

transobject <- RnBiseqSet(pheno, sites, meth, covg = coverage, assembly = "hg38")

I converted all objects from data frames to matrix and getting this error:

Warning: All sites have been removed, returning NULL

What is the exact structure that pheno and sites suppose to be?

schmic05 commented 4 years ago

Both pheno and sites should be data frames, so I guess that's not the problem. It might be that our internal structure does not match the annotation that you provided. For instance, we use "chr1" rather than "1" for the chromosomes and CpGs should have length 1 (i.e. start 10496, end 10497) and strands are typically merged. Could you post the head of each of the matrices than it might be easier for me to figure out the problem?

bazyliszek commented 4 years ago

Great. Thank you. Here it comes:

RnBiseqSet(pheno, sites, meth, covg = coverage, assembly = "hg38") Warning: All sites have been removed, returning NULL

class(sites) [1] "data.frame" class(pheno) [1] "data.frame" class(meth) [1] "matrix" class(coverage) [1] "matrix" head(pheno,2) condition BDg sample ADF1 A GrpB ADF1 ADF2 A GrpA ADF2 head(sites,2) seqnames start end width strand index 1 chr1 16243 16243 1 1 2 chr1 16555 16555 1 2 meth[1:2, 1:2] [,1] [,2] [1,] 0.75 0.7058824 [2,] 1.00 0.8666667 coverage[1:2, 1:2] [,1] [,2] [1,] 8 17 [2,] 8 15

schmic05 commented 4 years ago

The sites matrix should have three columns with the names chromosome, position and strand as described in ?RnBiseqSeq of the RnBeads documentation:

sites
CpG site definition, as a data.frame with 3 variables: chromosome (of type character), position (integer) and strand (character, one of "+", "-" or "*"

Hope that solves the issue.

bazyliszek commented 4 years ago

Looks like it did help indeed for small object (I was testing on 100000). Thank you. However when I run all 22 millions, I am getting error. Maybe I just have a problem now with server, I do not expect to have any NA in any of the matrix.

NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal, NAreal)) 3: update.ff(ret, from = initdata, delete = FALSE, bydim = bydim, BATCHSIZE = BATCHSIZE, BATCHBYTES = BATCHBYTES, VERBOSE = VERBOSE) 4: ff(rep(na.prototype, row.n), ...) 5: FUN(X[[i]], ...) 6: lapply(1:col.n, FUN = function(j) { res <- ff(rep(na.prototype, row.n), ...) if (active.closing) close.ff(res) res}) 7: lapply(1:col.n, FUN = function(j) { res <- ff(rep(na.prototype, row.n), ...) if (active.closing) close.ff(res) res}) 8: BigFfMat(row.n = nrow(region.indices), col.n = nSamples, col.names = samples(object), finalizer = bff.finalizer) 9: .local(object, ...) 10: summarize.regions(object, "strands", aggregation = aggr) 11: summarize.regions(object, "strands", aggregation = aggr) 12: RnBiseqSet(pheno, sites, meth, covg = coverage, assembly = "hg38")

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection:

schmic05 commented 4 years ago

It looks as if you don't have sufficient main memory for this analysis. It might be possible to solve this by setting the option: rnb.options("disk.dump.bigff"=FALSE)

bazyliszek commented 4 years ago

great. I needed to put these two to work and now I have the object ready rnb.option("disk.dump.bigff"=FALSE) rnb.option("disk.dump.big.matrices"=FALSE) Thank you!