Closed connorrogerson closed 2 years ago
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = data.frame(annotation_file), colData=col_data, design=~type)
here the issue is with large annotation matrices and casting it to a data.frame
. Instead, you can pass
annotation_file <- 'path/to/annotation_file'
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = annotation_file, colData=col_data, design=~type)
this uses fread
internally.
Hope that helps
Hi Sudeep
I did try this initially and it errored as follows:
Warning in writeBin(bfr, con = out, size = 1L) : problem writing to connection
(this repeats a lot)
Error in fread(fname, sep = "\t", stringsAsFactors = FALSE, header = TRUE) : Opened 5.178GB (5560004608 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.
Could you please post your sessionInfo()
?
R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ff_4.0.4 bit_4.0.4 ggrepel_0.9.1 data.table_1.14.0
[5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4
[9] readr_1.4.0 tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.5
[13] tidyverse_1.3.1 IHW_1.14.0 DEWSeq_1.0.6 DESeq2_1.26.0
[17] SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1 matrixStats_0.58.0
[21] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1 IRanges_2.20.2
[25] S4Vectors_0.24.4 BiocGenerics_0.32.0 R.utils_2.11.0 R.oo_1.24.0
[29] R.methodsS3_1.8.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-1 ellipsis_0.3.2 htmlTable_2.3.0 XVector_0.26.0 fs_1.5.0
[6] base64enc_0.1-3 rstudioapi_0.13 bit64_4.0.5 AnnotationDbi_1.48.0 fansi_0.4.2
[11] lubridate_1.7.10 xml2_1.3.2 splines_3.6.3 cachem_1.0.4 geneplotter_1.64.0
[16] knitr_1.36 Formula_1.2-4 jsonlite_1.7.2 broom_0.7.10 annotate_1.64.0
[21] cluster_2.1.2 dbplyr_2.1.1 png_0.1-7 BiocManager_1.30.16 compiler_3.6.3
[26] httr_1.4.2 backports_1.2.1 assertthat_0.2.1 Matrix_1.3-3 fastmap_1.1.0
[31] cli_3.0.1 htmltools_0.5.1.1 tools_3.6.3 gtable_0.3.0 glue_1.4.2
[36] GenomeInfoDbData_1.2.2 Rcpp_1.0.6 slam_0.1-48 cellranger_1.1.0 vctrs_0.3.8
[41] xfun_0.25 rvest_1.0.2 lifecycle_1.0.1 XML_3.99-0.3 zlibbioc_1.32.0
[46] scales_1.1.1 hms_1.1.1 RColorBrewer_1.1-2 yaml_2.2.1 memoise_2.0.1
[51] gridExtra_2.3 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.6.1 RSQLite_2.2.7
[56] genefilter_1.68.0 checkmate_2.0.0 rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7
[61] evaluate_0.14 lattice_0.20-44 lpsymphony_1.14.0 htmlwidgets_1.5.4 tidyselect_1.1.1
[66] magrittr_2.0.1 R6_2.5.1 generics_0.1.1 Hmisc_4.5-0 DBI_1.1.1
[71] withr_2.4.3 pillar_1.6.4 haven_2.4.1 foreign_0.8-75 survival_3.2-11
[76] RCurl_1.98-1.3 nnet_7.3-16 modelr_0.1.8 crayon_1.4.2 fdrtool_1.2.16
[81] utf8_1.2.1 rmarkdown_2.11 jpeg_0.1-8.1 locfit_1.5-9.4 grid_3.6.3
[86] readxl_1.3.1 blob_1.2.2 reprex_2.0.1 digest_0.6.27 xtable_1.8-4
[91] munsell_0.5.0
From this stackoverflow post Warning in writeBin(bfr, con = out, size = 1L) : problem writing to connection
looks again like a tmp. folder issue,
and file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.
looks like this data.table issue
Probably a quick fix would be updating R version or data.table ?
Leaving this open until a fix can be found
The memory issue is due to running this on the rubbish computers you get in UK universities. Tried to free some space in tmp directory and re-made the annotation file using a sliding window of w100s50 and this solved the issue for me.
I'm running out of memory trying to create ddw object using DESeqDataSetFromSlidingWindows.
Code is:
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = data.frame(annotation_file), colData=col_data, design=~type)
Result is:
Error: cannot allocate vector of size 1024.0 Mb
I checked dimensions of matrixes:
> dim(count_matrix)
[1] 381366 16
> dim(annotation_file)
[1] 88574203 12
Does the annotation seem a bit large? I struggled to upload this to R in the first place using
fread
so usedff
instead.I followed the examples from https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10 to generate the annotation file so I'm not sure how to fix it.
Any help would be much appreciated.