epigen / RnBeads

Git working Repo synced to the Bioconductor: http://bioconductor.org/packages/devel/bioc/html/RnBeads.html
https://rnbeads.org/
7 stars 8 forks source link

Is rnb.execute.import() has a bug in reading bismarkCytosine formart bed file #7

Closed cby01 closed 3 years ago

cby01 commented 4 years ago

Hi Pavlo, I keep getting errors with importing bismarkCytosine formart bed file. When I check the log, I found all sites have been removed. So I check the source code, and then I found the function rnb.execute.import() line:248 in the file of loading.R has a puzzling parameter, which the function "read.bed.files()" was called with a wrong parameter 'pos.coord.shift=1L,' and an undeclared parameter 'coord.shift = 0L'.

When I call read.bed.files() independently, it can read and create an object.

So, is it a bug? Please help me to check!

Any help will be highly appreciated.

Thank you!

Below is the parameter pos.coord.shift of function read.bed.files() declaration : param pos.coord.shift The frame shift between the the CpG annotation (1-based) and the coordinates in the loaded BEDs. If BEDs have 0-based coordinates, \code{pos.coord.shift=1} (default)

Below is function rnb.execute.import() line:248 else if (rnb.getOption("import.bed.style")=="bismarkCytosine"){ result <- read.bed.files(base.dir=data.source[[1]], sample.sheet=data.source[[2]], file.names.col=filename.column, verbose=verbose, pos.coord.shift=1L, # I think it should be 0 for (1-based) bed coordinates skip.lines=0, chr.col=1L, start.col=2L, end.col=NA, c.col=4L, t.col=5L, strand.col=3L, mean.meth.col=NA, coverage.col=NA, coord.shift = 0L, # I think it should be remove nrows=nrows)

this is part of my wrong log: 2020-03-17 17:47:30 5.2 STATUS STARTED Loading Data From BED Files 2020-03-17 17:47:31 5.2 INFO Reading BED file: .//data//bed/1322P.cytosine_report_chr1.bed 2020-03-17 17:47:44 7.6 STATUS Read 1 BED files 2020-03-17 17:47:45 4.0 STATUS Matched chromosomes and strands to annotation 2020-03-17 17:47:45 4.0 STATUS Checked for the presence of sites and coverage 2020-03-17 17:47:45 4.2 STATUS Initialized meth/covg matrices opening ff C:/Users/cby/AppData/Local/Temp/Rtmp0yYQkI/ffb71c4df4595.ff 2020-03-17 17:47:45 4.6 STATUS Combined a data matrix with 4568940 sites and 1 samples 2020-03-17 17:47:45 4.6 STATUS Processed all BED files 2020-03-17 17:47:45 4.6 STATUS STARTED Creating RnBiseqSet object 2020-03-17 17:47:50 5.2 WARNING All sites have been removed, returning NULL 2020-03-17 17:47:50 5.2 STATUS COMPLETED Creating RnBiseqSet object 2020-03-17 17:47:50 5.2 STATUS COMPLETED Loading Data From BED Files 2020-03-17 17:47:50 5.2 STATUS Loaded data from .//data//bed

Call read.bed.files()standalone:

data.source <- c( bedDir , sampleSheet, "filename_bed") data.source <- as.list(data.source) result <- read.bed.files(base.dir=data.source[[1]], sample.sheet=data.source[[2]], file.names.col=data.source[[3L]], verbose=TRUE,
pos.coord.shift=0L, #this is my modify skip.lines=0,
chr.col=1L, start.col=2L, end.col=NA, c.col=4L, t.col=5L, strand.col=3L, mean.meth.col=NA, coverage.col=NA, nrows=-1L, usebigff=TRUE)

Reading BED file: .//data//bed/1322P.cytosine_report_chr1.bed Read 1 BED files Matched chromosomes and strands to annotation Checked for the presence of sites and coverage Initialized meth/covg matrices opening ff C:/Users/cby/AppData/Local/Temp/Rtmp0yYQkI/ffb71ca465dd6.ff Combined a data matrix with 4568940 sites and 1 samples Processed all BED files Matched 2284470 of 4568940 methylation sites to the annotation Checking site coverage Removed 207795 of 2284470 methylation sites because they were not covered in any sample Creating methylation matrix Creating coverage matrix Creating object Summarizing strand methylation Summarizing tiling methylation Summarizing genes methylation Summarizing promoters methylation Summarizing cpgislands methylation

Lux91 commented 4 years ago

Hi, I'm having the exact same issue with bismakCytosine files. I literally tried every possible thing. Let me know if you find a workaround.

Lux91 commented 4 years ago

I found a temporary workaround for this problem. If you create a copy of your bismarkCytosine files and modify their coordinates subtracting 1 (with awk or else), rnbeads will add that 1 itself ang the run.import function will work. It's not a permanent fix, but it makes things work. :)

cby01 commented 4 years ago

I modfiy the source code and reinstall it into the R,and it work well,but it not a official revision. I have write a mail to the team but there is no reply.

schmic05 commented 4 years ago

Hi, sorry for being unresponsive, we first had to figure out some internal things and didn't have much time to look into all of this. The changes that you suggested might indeed be reasonable, and it is probably a bug within RnBeads. It would be great if you could create a pull request, such that we can merge your changes into the current version of RnBeads. Thanks a lot for spotting this and for using RnBeads.

cby01 commented 4 years ago

Hi, sorry for being unresponsive. I didn't know how to create a pull request. but I found an old version RnBeads in https://github.com/cirruswolke/RnBeads/blob/master/R/loading.R
here, in line 249, I found an other modfiy, I think it can also solve the problem. else if (rnb.getOption("import.bed.style")=="bismarkCytosine"){ result <- read.bed.files(base.dir=data.source[[1]], sample.sheet=data.source[[2]], file.names.col=filename.column, verbose=verbose, pos.coord.shift=1L, skip.lines=0, chr.col=1L, start.col=2L, end.col=NA, c.col=4L, t.col=5L, strand.col=3L, mean.meth.col=NA, coverage.col=NA, coord.shift = -1L, ##this is the modify nrows=nrows)