mincrit: problem with errors & aborting R session

emmcarr commented 8 years ago

Hi, Thanks for creating abctools - it seems like a great package to use in combination with the abc package. I have 32 summary statistics and am trying to use mincrit or AS.select to subset, since Blum et al (2013) suggest this approach with a smaller number of summary statistics. Unfortunately when I use mincrit with one row of data, I get this error:

scen3 <-mincrit(obs=target2, param=IslConM.params[1,], sumstats=sumstatIslConM2[1, ],

tol=.01, method="rejection", do.crit=TRUE) doing statistics: 1 ( 1 / 1 ) Error in sumstats[, I] : incorrect number of dimensions In addition: Warning message: In max(do.only) : no non-missing arguments to max; returning -Inf

The observed data (target2) and simulated data (sumstatIslConM2) have the same number of columns and column names, and both are matrices.When I try to run the command for more data, it hangs and then the R session is aborted, with no error given.

If you could please give me some advice on how to correct this error that would be great. I've updated all the packages used in abc and abctools and restarted the computer, with no improvement seen.

thanks, Emma

dennisprangle commented 8 years ago

Hi Emma, thanks for the feedback! These methods are only useful when you have a large number of rows, so I would expect problems if you use only a single row. We should probably generate a helpful error message in this case though.

The problem with R crashing for more rows of data sounds more serious. Do you have any short example code you can post or link to that reproduces the problem?

Edit: also I've spoken to my co-author and he suggests there can be memory allocation problems with 32 summary statistics (as there are a very large number of possible subsets). Setting do.crit=FALSE might avoid some of them.

emmcarr commented 8 years ago

Hi Dennis, sure, here is an example from one of the four scenarios I'm investigating. I've also attached 10K sims and params for this scenario, the full run has been 100K sims for each scenario. cheers Emma

Importing scenario 3: isolation and contact : one post-contact migration rate IslConM.out<-read.delim(file="IslConM_10Koutss.txt", sep = "\t", header=TRUE) IslConM.params<-read.delim(file="IsoContMR_10K.params.txt", sep = "\t", header=TRUE, row.names=NULL)

changing to matrices as data are all numeric & it makes writing function easier IslConM.out<-as.matrix(IslConM.out) IslConM.params<-as.matrix(IslConM.params)

sumstatIslConM2=IslConM.out[,c('K_1' , 'K_2' , 'Ksd_1' , 'Ksd_2' , 'mean_K' , 'sd_K' , 'tot_K' , 'H_1' , 'H_2' , 'Hsd_1' , 'Hsd_2' , 'mean_H' , 'sd_H' , 'tot_H' , 'GW_1' , 'GW_2' , 'GWsd_1' , 'GWsd_2' , 'mean_GW' , 'sd_GW' , 'tot_GW' , 'R_1' , 'R_2' , 'Rsd_1' , 'Rsd_2' , 'mean_R' , 'sd_R' , 'tot_R' , 'FIS' , 'FST' , 'FIT' , 'FST_2_1')]

target from observed statistics target<-matrix(nrow=1, ncol=32, c(9.58824,9.82353,3.37377,4.59939,9.70588,0.166378,11.1765,0.775234,0.76437,0.0951308,0.117777,0.769802,0.00768214,0.774675,0.412155262,0.398922568,0.132718982,0.126791019,0.405538915,0.009356928,0.425559475,25.82352941,25.23529412,16.07107376,14.56678333,25.52941176,0.415945165, 27.70588235,0.0410882,0.0121871,0.0527745,0.012729))

colnames(target)<-c('K_1' , 'K_2' , 'Ksd_1' , 'Ksd_2' , 'mean_K' , 'sd_K' , 'tot_K' , 'H_1' , 'H_2' , 'Hsd_1' , 'Hsd_2' , 'mean_H' , 'sd_H' , 'tot_H' , 'GW_1' , 'GW_2' , 'GWsd_1' , 'GWsd_2' , 'mean_GW' , 'sd_GW' , 'tot_GW' , 'R_1' , 'R_2' , 'Rsd_1' , 'Rsd_2' , 'mean_R' , 'sd_R' , 'tot_R' , 'FIS' , 'FST' , 'FIT' , 'FST_2_1')

scen4 <-mincrit(obs=target, param=IslConMwM.params, sumstats=sumstatIslConM2, tol=.01, method="rejection", do.crit=TRUE) IslConM_10Koutss.txt IsoContMR_10K.params.txt

dennisprangle commented 8 years ago

Thanks for these details, we'll look into it.

In the meantime I've added a check + error message to mincrit for the case where there are too few simulations (commit e54f12b).

dennisprangle commented 8 years ago

Ok we've had a look at this. One issue is that IslConM.params[,"X"] is made up of NAs. But the main issue is that it's too time consuming to search through all 2^32=10^9 subsets of summary statistics. Try using the limit and do.only arguments to reduce the number of subsets that you search. These are described on page 6/7 of the abctools paper https://journal.r-project.org/archive/2015-2/nunes-prangle.pdf. Also, it might be worth invesigating some of the projection methods for choosing summary statistics.

dennisprangle / abctools

mincrit: problem with errors & aborting R session #2