Closed emmcarr closed 7 years ago
Hi Emma, thanks for the feedback! These methods are only useful when you have a large number of rows, so I would expect problems if you use only a single row. We should probably generate a helpful error message in this case though.
The problem with R crashing for more rows of data sounds more serious. Do you have any short example code you can post or link to that reproduces the problem?
Edit: also I've spoken to my co-author and he suggests there can be memory allocation problems with 32 summary statistics (as there are a very large number of possible subsets). Setting do.crit=FALSE
might avoid some of them.
Hi Dennis, sure, here is an example from one of the four scenarios I'm investigating. I've also attached 10K sims and params for this scenario, the full run has been 100K sims for each scenario. cheers Emma
Importing scenario 3: isolation and contact : one post-contact migration rate IslConM.out<-read.delim(file="IslConM_10Koutss.txt", sep = "\t", header=TRUE) IslConM.params<-read.delim(file="IsoContMR_10K.params.txt", sep = "\t", header=TRUE, row.names=NULL)
changing to matrices as data are all numeric & it makes writing function easier IslConM.out<-as.matrix(IslConM.out) IslConM.params<-as.matrix(IslConM.params)
sumstatIslConM2=IslConM.out[,c('K_1' , 'K_2' , 'Ksd_1' , 'Ksd_2' , 'mean_K' , 'sd_K' , 'tot_K' , 'H_1' , 'H_2' , 'Hsd_1' , 'Hsd_2' , 'mean_H' , 'sd_H' , 'tot_H' , 'GW_1' , 'GW_2' , 'GWsd_1' , 'GWsd_2' , 'mean_GW' , 'sd_GW' , 'tot_GW' , 'R_1' , 'R_2' , 'Rsd_1' , 'Rsd_2' , 'mean_R' , 'sd_R' , 'tot_R' , 'FIS' , 'FST' , 'FIT' , 'FST_2_1')]
target from observed statistics target<-matrix(nrow=1, ncol=32, c(9.58824,9.82353,3.37377,4.59939,9.70588,0.166378,11.1765,0.775234,0.76437,0.0951308,0.117777,0.769802,0.00768214,0.774675,0.412155262,0.398922568,0.132718982,0.126791019,0.405538915,0.009356928,0.425559475,25.82352941,25.23529412,16.07107376,14.56678333,25.52941176,0.415945165, 27.70588235,0.0410882,0.0121871,0.0527745,0.012729))
colnames(target)<-c('K_1' , 'K_2' , 'Ksd_1' , 'Ksd_2' , 'mean_K' , 'sd_K' , 'tot_K' , 'H_1' , 'H_2' , 'Hsd_1' , 'Hsd_2' , 'mean_H' , 'sd_H' , 'tot_H' , 'GW_1' , 'GW_2' , 'GWsd_1' , 'GWsd_2' , 'mean_GW' , 'sd_GW' , 'tot_GW' , 'R_1' , 'R_2' , 'Rsd_1' , 'Rsd_2' , 'mean_R' , 'sd_R' , 'tot_R' , 'FIS' , 'FST' , 'FIT' , 'FST_2_1')
scen4 <-mincrit(obs=target, param=IslConMwM.params, sumstats=sumstatIslConM2, tol=.01, method="rejection", do.crit=TRUE) IslConM_10Koutss.txt IsoContMR_10K.params.txt
Thanks for these details, we'll look into it.
In the meantime I've added a check + error message to mincrit
for the case where there are too few simulations (commit e54f12b).
Ok we've had a look at this. One issue is that IslConM.params[,"X"]
is made up of NA
s. But the main issue is that it's too time consuming to search through all 2^32=10^9 subsets of summary statistics. Try using the limit
and do.only
arguments to reduce the number of subsets that you search. These are described on page 6/7 of the abctools paper https://journal.r-project.org/archive/2015-2/nunes-prangle.pdf. Also, it might be worth invesigating some of the projection methods for choosing summary statistics.
Hi, Thanks for creating abctools - it seems like a great package to use in combination with the abc package. I have 32 summary statistics and am trying to use mincrit or AS.select to subset, since Blum et al (2013) suggest this approach with a smaller number of summary statistics. Unfortunately when I use mincrit with one row of data, I get this error:
The observed data (target2) and simulated data (sumstatIslConM2) have the same number of columns and column names, and both are matrices.When I try to run the command for more data, it hangs and then the R session is aborted, with no error given.
If you could please give me some advice on how to correct this error that would be great. I've updated all the packages used in abc and abctools and restarted the computer, with no improvement seen.
thanks, Emma