Open DarioS opened 3 years ago
Thanks for discovering and reporting this. Would you please provide a reproducible example?
I made a minimal example, and it worked fast. It must be something particular about my data.
GRL1 <- split(GRanges("chr1", IRanges(rpois(1000000, 100000), width = rpois(1000000, 1000))),
sample(20000, 1000000, replace = TRUE))
GRL2 <- split(GRanges("chr1", IRanges(rpois(1000000, 100000), width = rpois(1000000, 1000))),
sample(20000, 1000000, replace = TRUE))
GRL1 <- as(GRL1, "GRangesList")
GRL2 <- as(GRL2, "GRangesList")
system.time(pc(GRL1, GRL2))
user system elapsed
0.314 0.035 0.363
I wanted to share my real data for you to check, but saving it failed after a few minutes of processing.
> sum(elementNROWS(upstreamCisRegions))
[1] 437067
> sum(elementNROWS(downstreamCisRegions))
[1] 596053
> class(upstreamCisRegions)
[1] "SimpleGRangesList"
attr(,"package")
[1] "GenomicRanges"
> class(downstreamCisRegions)
[1] "SimpleGRangesList"
attr(,"package")
[1] "GenomicRanges"
> save.image(upstreamCisRegions, downstreamCisRegions, file = "CisRegions.RData")
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, :
subscript is out of bounds
Warning message:
In file.remove(outfile) :
cannot remove file 'CisRegions.RDataTmp', reason 'No such file or directory'
Saving the test lists also fails.
> class(GRL1)
[1] "CompressedGRangesList"
attr(,"package")
[1] "GenomicRanges"
> class(GRL2)
[1] "CompressedGRangesList"
attr(,"package")
[1] "GenomicRanges"
> save.image(GRL1, GRL2, file = "test.RData")
Error in relist(flesh, PartitioningByEnd(skeleton)) :
shape of 'skeleton' is not compatible with 'NROW(flesh)'
Warning message:
In file.remove(outfile) :
cannot remove file 'test.RDataTmp', reason 'No such file or directory'
I have a fairly standard R configuration.
> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.48.0 BiocParallel_1.22.0 GenomicFeatures_1.40.0 AnnotationDbi_1.50.0
[5] Biobase_2.48.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 IRanges_2.22.2
[9] S4Vectors_0.26.1 BiocGenerics_0.34.0
dont use save.image, use save
save(GRL1, GRL2, file = "test.RData")
This should work
A previous question on the support site has a comment on an answer from the person asking the question:
The solution involves the use of
pc
. When I do it with the current version of GenomicRanges and R 4.0.1, it takes a few minutes. Could it be a performance regression since 2017? The length of each list is also about 20000 components, like the original question.