EuracBiomedicalResearch / FamAgg

This is the development version of the FamAgg Bioconductor package.
https://EuracBiomedicalResearch.github.io/FamAgg
MIT License
0 stars 2 forks source link

Reduce memory footprint in simulation runs. #26

Closed the-x-at closed 3 years ago

the-x-at commented 3 years ago

A complete simulation run is broken down into manageable chunks of simulations, 1000 at the moment. Data obtained in this way are added incrementally to dedicated data structures. Histograms and densities in the result objects are affected by this change, and will appear slightly different. This fixes issue #22. Needless to say that R CMD check finishes without complaints.

Tests corrected: Kinship sum test, Kinship group test.

Test not corrected: Familial incidence rate test.

Test not affected by this issue: Genealogical index of familiality test.

I have repeated the familial aggregation runs presented in the memory consumption table in issue #22. For documentation purposes, these runs are command line calls using the Institute for Biomedicine (EURAC) internal Gitlab chris-famagg package,

maxmem time /home/shared/bioinf/R/bin/Rscript-4.0-BioC3.12 src/R/famaggize.R --skipkin --trait bu02-up-x0lp17 --method ks --plotter ks2paint --pedformat png --dataset c13000 --nsim N

where N is the number of simulation runs given in the following table:

N Time (s) Max. mem (KB) Time (s, fix) Max. mem (KB, fix)
1,000 61 642,848 68 593,224
2,000 92 564,812 102 594,156
4,000 132 619,124 138 625,572
8,000 236 731,824 255 656,464
16,000 437 900,644 463 585,188
32,000 799 1,368,572 725 645,312
64,000 1,400 2,558,352 1,553 610,064
128,000 3,088 5,054,656 3,001 624,660
256,000 5,794 8,911,296 6,109 647,648
512,000 12,445 15,579,764 11,814 638,508
1,024,000 24,803 34,090,076 NA NA

The first three columns are as in issue #22, the last two columns contain values measured with the memory fix presented here. Runtime is barely affected, the fluctuations are due to the workload of the time of the experiment. It appears there is a bit of an overhead introduced, expressed in runtime increments below 11% with a remarkable outlier for N = 512,000 (-6%). We have refrained from running a million simulations, simply because the trend is so evident and to save time, resources, energy, and the planet.

the-x-at commented 3 years ago

Improved the items discussed. Looks better now. Also de-jo-ified said code chunk. There is some problem with the package version number, that's also why the commit action fails under Linux. Code is validated in R-4.1.0 with Bioconductor 3.13 with R CMD check.

jorainer commented 3 years ago

I also updated the unit tests and bumped the version - will merge after the unit tests are done

jorainer commented 3 years ago

Looks like Windows tests fails because binaries for BioC 3.14 are offline.