asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
https://cran.r-project.org/package=dtwclust
GNU General Public License v3.0
250 stars 29 forks source link

tsclust() abort Rstudio session #76

Open IrelCM opened 2 weeks ago

IrelCM commented 2 weeks ago

Hello, I'am using tsclust function and then do a cluster evaluation. This is the command: clust.pam <- tsclust(DF.time, k=4L:12L, distance="dtw_basic", centroid="pam", seed = 1234L) When I execute the command some minutes after Rstudio send a message: R session aborted. R encountered a fatal error. The session was terminated.

My DF.time comes from a expression matrix (81 samples and 3450 genes), in columns there are samples and in rows genes. Then, I used reshape2::cast to have in rows: Gene_Condition_Replicate as ID and in columns each time point (There are 9).

dim(DF.time) [1] 31050 9

I think the problem comes from the dimension of my DF, because I tested with 100 rows and ot works. Could you help me, please.

Irelka

sessionInfo() R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Paris tzcode source: system (glibc)

attached base packages: [1] stats4 grid stats graphics grDevices utils datasets methods base

other attached packages: [1] reshape2_1.4.4 dtwclust_5.5.12 dtw_1.23-1 proxy_0.4-27 NbClust_3.0.1
[6] factoextra_1.0.7 vsn_3.72.0 pathview_1.44.0 ComplexHeatmap_2.20.0 splitstackshape_1.4.8
[11] ggpubr_0.6.0.999 DESeq2_1.44.0 SummarizedExperiment_1.34.0 Biobase_2.64.0 MatrixGenerics_1.16.0
[16] matrixStats_1.3.0 GenomicRanges_1.56.0 GenomeInfoDb_1.40.1 IRanges_2.38.0 S4Vectors_0.42.0
[21] BiocGenerics_0.50.0 clusterProfiler_4.12.0 gridExtra_2.3 ggrepel_0.9.5 ggplot2_3.5.1
[26] stringr_1.5.1 dplyr_1.1.4 magrittr_2.0.3 readxl_1.4.3 VennDiagram_1.7.3
[31] futile.logger_1.4.3 readr_2.1.5

asardaes commented 2 weeks ago

Remember that if n is the number of series, the cross distance matrix will need n^2 elements. If each number is 8 bytes, you'll need a bit over 7GiB of RAM just for the matrix. How much memory do you have available?

IrelCM commented 1 week ago

I assume that I don't have enough available:

free -h total used free shared buff/cache available Mem: 15Gi 10Gi 361Mi 336Mi 5.3Gi 5.0Gi Swap: 976Mi 976Mi 0B

asardaes commented 1 week ago

Yeah, that seems likely. Unfortunately there's not much that can be done in that case, R works with data in memory :/

IrelCM commented 1 week ago

Thank you. I will try with another machine ;)