EMBL-Hentze-group / DEWSeq

R/Bioconductor package for e/iCLIP data analysis
5 stars 1 forks source link

Memory issues #1

Closed connorrogerson closed 2 years ago

connorrogerson commented 2 years ago

I'm running out of memory trying to create ddw object using DESeqDataSetFromSlidingWindows.

Code is: ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = data.frame(annotation_file), colData=col_data, design=~type)

Result is: Error: cannot allocate vector of size 1024.0 Mb

I checked dimensions of matrixes: > dim(count_matrix) [1] 381366 16 > dim(annotation_file) [1] 88574203 12

Does the annotation seem a bit large? I struggled to upload this to R in the first place using fread so used ff instead.

I followed the examples from https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10 to generate the annotation file so I'm not sure how to fix it.

Any help would be much appreciated.

sudeepsahadevan commented 2 years ago

ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = data.frame(annotation_file), colData=col_data, design=~type)

here the issue is with large annotation matrices and casting it to a data.frame. Instead, you can pass annotation_file <- 'path/to/annotation_file'

ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = annotation_file, colData=col_data, design=~type) this uses fread internally. Hope that helps

connorrogerson commented 2 years ago

Hi Sudeep

I did try this initially and it errored as follows:

Warning in writeBin(bfr, con = out, size = 1L) : problem writing to connection (this repeats a lot)

Error in fread(fname, sep = "\t", stringsAsFactors = FALSE, header = TRUE) : Opened 5.178GB (5560004608 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.

sudeepsahadevan commented 2 years ago

Could you please post your sessionInfo() ?

connorrogerson commented 2 years ago

R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ff_4.0.4 bit_4.0.4 ggrepel_0.9.1 data.table_1.14.0
[5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4
[9] readr_1.4.0 tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.5
[13] tidyverse_1.3.1 IHW_1.14.0 DEWSeq_1.0.6 DESeq2_1.26.0 [17] SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1 matrixStats_0.58.0
[21] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1 IRanges_2.20.2
[25] S4Vectors_0.24.4 BiocGenerics_0.32.0 R.utils_2.11.0 R.oo_1.24.0
[29] R.methodsS3_1.8.1

loaded via a namespace (and not attached): [1] colorspace_2.0-1 ellipsis_0.3.2 htmlTable_2.3.0 XVector_0.26.0 fs_1.5.0
[6] base64enc_0.1-3 rstudioapi_0.13 bit64_4.0.5 AnnotationDbi_1.48.0 fansi_0.4.2
[11] lubridate_1.7.10 xml2_1.3.2 splines_3.6.3 cachem_1.0.4 geneplotter_1.64.0 [16] knitr_1.36 Formula_1.2-4 jsonlite_1.7.2 broom_0.7.10 annotate_1.64.0
[21] cluster_2.1.2 dbplyr_2.1.1 png_0.1-7 BiocManager_1.30.16 compiler_3.6.3
[26] httr_1.4.2 backports_1.2.1 assertthat_0.2.1 Matrix_1.3-3 fastmap_1.1.0
[31] cli_3.0.1 htmltools_0.5.1.1 tools_3.6.3 gtable_0.3.0 glue_1.4.2 [36] GenomeInfoDbData_1.2.2 Rcpp_1.0.6 slam_0.1-48 cellranger_1.1.0 vctrs_0.3.8
[41] xfun_0.25 rvest_1.0.2 lifecycle_1.0.1 XML_3.99-0.3 zlibbioc_1.32.0
[46] scales_1.1.1 hms_1.1.1 RColorBrewer_1.1-2 yaml_2.2.1 memoise_2.0.1
[51] gridExtra_2.3 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.6.1 RSQLite_2.2.7
[56] genefilter_1.68.0 checkmate_2.0.0 rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7
[61] evaluate_0.14 lattice_0.20-44 lpsymphony_1.14.0 htmlwidgets_1.5.4 tidyselect_1.1.1
[66] magrittr_2.0.1 R6_2.5.1 generics_0.1.1 Hmisc_4.5-0 DBI_1.1.1
[71] withr_2.4.3 pillar_1.6.4 haven_2.4.1 foreign_0.8-75 survival_3.2-11
[76] RCurl_1.98-1.3 nnet_7.3-16 modelr_0.1.8 crayon_1.4.2 fdrtool_1.2.16
[81] utf8_1.2.1 rmarkdown_2.11 jpeg_0.1-8.1 locfit_1.5-9.4 grid_3.6.3
[86] readxl_1.3.1 blob_1.2.2 reprex_2.0.1 digest_0.6.27 xtable_1.8-4
[91] munsell_0.5.0

sudeepsahadevan commented 2 years ago

From this stackoverflow post Warning in writeBin(bfr, con = out, size = 1L) : problem writing to connection looks again like a tmp. folder issue,

and file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available. looks like this data.table issue

Probably a quick fix would be updating R version or data.table ?

Leaving this open until a fix can be found

connorrogerson commented 2 years ago

The memory issue is due to running this on the rubbish computers you get in UK universities. Tried to free some space in tmp directory and re-made the annotation file using a sliding window of w100s50 and this solved the issue for me.