Closed trashmai closed 2 months ago
Hi
What are the dimensions of your data set (number rows, columns, and classes) ?
Using cross-validation in bca is really useful only when the number of columns is much higher than the number of rows, because of the risk of spurious groups in this case. You say you have 17073 rows, and an even higher number of columns could run into memory availability problems.
Also note that you may use cross-validation on a limited number of bca axes (instead of keeping all axes) to spare memory.
Jean
Hi Jean,
We have 68 classes and 59 columns, and we set nf=3 for both PCA and BCA. So, the number of axes for LOOCV should be 3 as well, right? (I tested nax=0 and nax=3 for LOOCV on a randomly sampled 500 rows and got very similar results).
We have around 100GB of free memory. Although I didn't monitor the memory usage, I ran LOOCV in parallel mode with both 8000 and 5000 randomly sampled rows. It worked with 5000 rows but failed on 8000. I'm now running it with parallel set to FALSE, and I hope to get a proper result. However, the progress bar indicates that the ETA is more than 6 days.
Thanks for your reply.
FYI: Parallel processing also failed on a machine with over 600GB of memory, resulting in the same error.
Thanks, I am trying to look into this
Can you check with the current devel version of ade4 on GitHub ?
I re-installed ade4 from the github as the instruction in README, re-ran the full analysis last night, and got exactly the same errors this morning.
I am sorry but I cannot reproduce the error that you mentioned ("Error in xcoo1[ind1, nax] : subscript out of bounds"). Note that this error happens only after the leave one out cross-validation loop. It happens during the computation of the group overlap index between bca and cross-validation coordinates, so it is not done in parallel computing mode.
I checked with 10,000 rows, 100 columns and 100 groups with no problem on a M1 Mac computer with only 8 GB of memory and all computations went fine. Moreover, computation time are much shorter than the ones you reported: only about 1 hour for 10,000 rows, 100 columns and 100 groups in single core and 20 minutes in multicore (parallel with 8 cores).
What kind of computer system are you using ? Can you please give us your sessionInfo() outputs ?
Thanks, Jean
We ran parallel on 3 computers,
(last time we used to run non-parallel)
R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=Cattached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] ade4_1.7-22
loaded via a namespace (and not attached): [1] MASS_7.3-51.5 compiler_3.6.3 Rcpp_1.0.9
(paralleling and get errors)
R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=Cattached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.14.8 ade4_1.7-22
loaded via a namespace (and not attached): [1] Rcpp_1.0.11 codetools_0.2-18 prettyunits_1.2.0 foreach_1.5.2
[5] crayon_1.5.2 MASS_7.3-55 R6_2.5.1 lifecycle_1.0.4
[9] rlang_1.1.2 progress_1.2.2 cli_3.6.1 doParallel_1.0.17 [13] vctrs_0.6.4 iterators_1.0.14 hms_1.1.3 parallel_4.1.2
[17] compiler_4.1.2 pkgconfig_2.0.3
(non-parallel computing)
R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200)
Matrix products: default
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale: [1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 LC_CTYPE=Chinese (Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950 LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950time zone: Asia/Taipei tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.14.8 ade4_1.7-22
loaded via a namespace (and not attached): [1] R6_2.5.1 codetools_0.2-19 doParallel_1.0.17 iterators_1.0.14 parallel_4.3.0
[6] pkgconfig_2.0.3 lifecycle_1.0.3 cli_3.6.1 foreach_1.5.2 vctrs_0.6.2
[11] compiler_4.3.0 prettyunits_1.1.1 tools_4.3.0 hms_1.1.3 Rcpp_1.0.10
[16] rlang_1.1.1 crayon_1.5.2 progress_1.2.2 MASS_7.3-58.4
I've noticed that the error didn't occur during the parallel processing stage, but could it be somehow related to the mclapply
warnings (such as the way groups and sample sizes were split and distributed for parallel processing failed to meet certain conditions, my random guess)? The non-parallel processing finished yesterday, and we got great results, which leads me to believe that the error was not directly caused by the computation of the group overlap index.
Hi,
First off, thanks a lot for this package.
I ran into some issues while running loocv in parallel mode with a bca result of 17,073 rows. After about 9 hours, I got these error messages:
Could you help me figure out what's going wrong?
Thanks!