famuvie / breedR

Statistical methods for forest genetic resources analysts
http://famuvie.github.io/breedR/
GNU General Public License v3.0
31 stars 24 forks source link

Excesive memory consumption building AR model #37

Closed JanKowalczyk closed 9 years ago

JanKowalczyk commented 9 years ago

Running below listed model system was out of memory.

res.ar1 <- remlf90(fixed = phe_X ~ poch,
                 # random = ~ block,
                   genetic = list(model = 'add_animal',
                                  pedigree = data[, c('self','dad','mum')],
                                  id = 'self'),
                   spatial = list(model = 'AR',
                                  coord = data[, c('x','y')] ,
                                  rho = c(.8,.8)), data = data,  method='em')

I found the solution: It is important to use 64 bit version of R in order to use all system memory (more then 4 GB)

famuvie commented 9 years ago

Thank you for your report, Jan. How big was your data?

Indeed, 64bit systems handle more memory. Nonetheless, I need to optimise breedR to handle memory as efficiently as possible.

It has not been a problem so far, so I did not pay much attention to this. I will leave the issue open until I conduct this optimisation of the code.

JanKowalczyk commented 9 years ago

Data are not big

> dim(data)
[1] 6084   11

The row and column extension are

> max(data$x)
[1] 126
> max(data$y)
[1] 84

See the plots below rplot07

famuvie commented 9 years ago

Indeed, this is more strange, as we have dealt often with data of this size.

Would you report the result of the following commands, when run under 32 bit?

sessionInfo()
memory.size()
memory.size(max = TRUE)

tar <- breedR:::breedr_ar(
          model = 'AR',
          coordinates = data[, c('x','y')] ,
          rho = c(.8,.8),
          var.ini = 1,
          autofill = TRUE,
          sparse = TRUE)
str(tar)
object.size(tar)

thanks!

JanKowalczyk commented 9 years ago

It is:

> library(breedR)
Ładowanie wymaganego pakietu: sp

Warning message:
In build_pedigree(1:3, data = ped.df) :
  The pedigree has been recoded. Check attr(ped, 'map').
Error in Matrix::tril(as(as.matrix(x), "dgTMatrix")) : 
  błąd podczas obliczania argumentu 'x' przy wybieraniu metody dla funkcji 'tril': Error: cannot allocate vector of size 427.3 Mb

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] breedR_0.10-8 sp_1.1-1     

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6      magrittr_1.5     maps_2.3-9       splines_3.2.1    MASS_7.3-40      munsell_0.4.2    colorspace_1.2-6
 [8] lattice_0.20-31  minqa_1.2.4      stringr_1.0.0    plyr_1.8.3       fields_8.2-1     tools_3.2.1      grid_3.2.1      
[15] spam_1.0-1       gtable_0.1.2     nlme_3.1-120     lme4_1.1-8       digest_0.6.8     Matrix_1.2-1     nloptr_1.0.4    
[22] reshape2_1.4.1   ggplot2_1.0.1    stringi_0.5-5    scales_0.2.5     proto_0.3-10    
> memory.size()
[1] 332.6
> memory.size(max = TRUE)
[1] 335.69
> tar <- breedR:::breedr_ar(
+ model = 'AR',
+ coordinates = data[, c('x','y')] ,
+ rho = c(.8,.8),
+ var.ini = 1,
+ autofill = TRUE,
+ sparse = TRUE)
> str(tar)
List of 5
 $ incidence.matrix:Formal class 'indMatrix' [package "Matrix"] with 4 slots
  .. ..@ perm    : int [1:6084] 127 757 883 1009 1135 1261 1387 1891 2143 3277 ...
  .. ..@ Dim     : int [1:2] 6084 10584
  .. ..@ Dimnames:List of 2
  .. .. ..$ : NULL
  .. .. ..$ : NULL
  .. ..@ factors : list()
 $ structure.matrix:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:94000] 0 1 126 127 0 1 2 126 127 128 ...
  .. ..@ p       : int [1:10585] 0 4 10 16 22 28 34 40 46 52 ...
  .. ..@ Dim     : int [1:2] 10584 10584
  .. ..@ Dimnames:List of 2
  .. .. ..$ : NULL
  .. .. ..$ : NULL
  .. ..@ x       : num [1:94000] 7.72 -6.17 -6.17 4.94 -6.17 ...
  .. ..@ factors : list()
 $ structure.type  : chr "precision"
 $ coordinates     :'data.frame':   6084 obs. of  2 variables:
  ..$ x: int [1:6084] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ y: int [1:6084] 2 7 8 9 10 11 12 16 18 27 ...
 $ param           :List of 1
  ..$ rho: num [1:2] 0.8 0.8
 - attr(*, "class")= chr [1:4] "ar" "spatial" "random" "breedr_effect"
 - attr(*, "grid")=List of 6
  ..$ origin  : Named num [1:2] 1 1
  .. ..- attr(*, "names")= chr [1:2] "x" "y"
  ..$ step    : Named num [1:2] 1 1
  .. ..- attr(*, "names")= chr [1:2] "x" "y"
  ..$ length  : Named num [1:2] 126 84
  .. ..- attr(*, "names")= chr [1:2] "x" "y"
  ..$ idx     : int [1:6084] 127 757 883 1009 1135 1261 1387 1891 2143 3277 ...
  ..$ regular : logi TRUE
  ..$ autofill: logi TRUE
> object.size(tar)
1271496 bytes
famuvie commented 9 years ago

The size of the data is misleading. If you have many 'holes' in your spatial arrangement, the dimension of the autoregressive effect can become much greater than N. This is what was affecting badly the autoregressive model, because I had some hidden conversion to non-sparse matrix formats that took up lots of memory. This is fixed now.

Jan, if you have a moment, would you repeat your test and verify that it runs better? Thanks.

JanKowalczyk commented 9 years ago

Thank you, I repeated the test and now it is OK.