bfarr / kombine

A kernel-density-based, embarrassingly parallel ensemble sampler
http://pages.uoregon.edu/bfarr/kombine
MIT License
34 stars 25 forks source link

Allow kde_size < nwalkers #12

Closed cdcapano closed 7 years ago

cdcapano commented 8 years ago

This patch allows one to set the kde_size < nwalkers. In that case, multiple kdes are created, with subsets of walkers assigned to each kde. For example, if nwalkers is 5000 and kde_size is 500, 10 kdes are created, with the first one using the first 500 walkers, the next 500 the second, etc. Doing this necessitates some changes to how Sampler stores kdes. Now, _kde is a list of kdes. Also, max_samples cannot be passed when calling run_mcmc. Instead, kde_size is set once when the sampler class is initialized.

The reason for doing this is I've found that calls to the kde can become fairly slow when using a large number of walkers. Here's a plot of the amount of time it takes to call the kde (measured by sticking time.time() calls just before and after the kde is called in _GetLnProbWrapper) using 5000 walkers (note: you need LVC credentials to view): https://www.atlas.aei.uni-hannover.de/~cdcapano/LSC/projects/pycbc_inference/cbc/gw150914/kombine/5kwalkers_1kiterations/kde-all_timing.png

The plot shows the min, max, and mean averaged over all of the walkers at each iteration during burn-in; dashed line shows the update intervals. Here is the same plot after this patch, setting the kde size to 500:

https://www.atlas.aei.uni-hannover.de/~cdcapano/LSC/projects/pycbc_inference/cbc/gw150914/kombine/5kwalkers_1kiterations-500kde_pts/kde-all_timing.png

Here is the resulting posteriors on GW150914 from this later run:

https://www.atlas.aei.uni-hannover.de/~cdcapano/LSC/projects/pycbc_inference/cbc/gw150914/kombine/5kwalkers_1kiterations-500kde_pts/posteriors-scatter.png

I don't think it's necessary to make the resolution of the kde be the same as the number of walkers, just that the draws from a proposal distribution be independent for each walker. This does mean that each subset of walkers sees a slightly different proposal distribution. I think that should be ok, but I'm not an expert on MCMCs, so please correct me if I'm wrong.

I'm also new to this package. I tried to match coding styles; apologies if I've messed something up.

codecov-io commented 8 years ago

Current coverage is 36.85% (diff: 11.53%)

Merging #12 into master will decrease coverage by 1.25%

@@             master        #12   diff @@
==========================================
  Files             7          7          
  Lines           656        681    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            250        251     +1   
- Misses          406        430    +24   
  Partials          0          0          

Powered by Codecov. Last update 4616db6...6a5a124

cdcapano commented 7 years ago

Closing this. Not sure it's the correct thing to do, plus PR #13 would require some changes to this. May revisit later.