elfi-dev / elfi

ELFI - Engine for Likelihood-Free Inference
http://elfi.readthedocs.io
BSD 3-Clause "New" or "Revised" License
265 stars 60 forks source link

Posterior methodologies with Random Forests #319

Open fradav opened 4 years ago

fradav commented 4 years ago

Summary:

Currently testing a python module wrapping https://github.com/diyabc/abcranger : posterior methodologies (model choice and parameter estimation) with Random Forests on reference table. (See the references)

Description:

I would like to know the best way to integrate the posterior methodologies into the elfi pipeline. It seems any inference method in elfi should have an "iterate" method with every new sample, but both methodologies haven't got any (they need the whole reference table at once)

See the demos at : https://github.com/diyabc/abcranger/blob/master/testpy/Model%20Choice%20Demo.ipynb and https://github.com/diyabc/abcranger/blob/master/testpy/Parameter%20Estimation%20Demo.ipynb

Note that the basic rejection sampler is more than enough with those methodologies (and the threshold parameter almost doesn't matter).

Regards,

hpesonen commented 4 years ago

Hi! Could you clarify a bit what you mean by posterior methodologies and integration of them into ELFI pipeline? e.g. would you like to implement RF-ABC within ELFI? In this case iterate could still used when producing table in batches.

Note that If you don't care about the threshold for rejABC and only would to generate a reference table from the ELFI-model, you can also set quantile = 1.0 in sample-method.

fradav commented 4 years ago

Hi,

I'm working with J-M. Marin, and posterior RF methodologies like model choice and parameter estimation work directly on ABC reference tables, as stated in :

By integration in ELFI, I originally mean to implement a new inference method like documented there.

I am not sure about batch processing. RF-ABC prediction performance degrades a lot if you take only small subset of the data. I don't know either how to "accumulate" posterior results from successive batches any other than retraining a forest on all past batches, which of course defeats the batch's purpose. I think this is a perhaps use case for "mondrian" forests; not classical Breiman's rf like the ones we use, but mondrian forests (Lakshminarayanan, Roy, and Teh 2014) are a totally different beast, and there is a lot caveats to them vs Breiman's (sensibility to noise is one of them). Anyway, this is an interesting track for future work.

Threshold doesn't matter "much" with RF-ABC, but it doesn't mean we shouldn't have one, so I think quantile = 1.0 isn't recommended either (I'll double-check this with JM Marin).

References

Pudlo, Pierre, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, and Christian P Robert. 2015. “Reliable Abc Model Choice via Random Forests.” *Bioinformatics* 32 (6): 859–66.
Raynal, Louis, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P Robert, and Arnaud Estoup. 2018. “ABC random forests for Bayesian parameter inference.” *Bioinformatics* 35 (10): 1720–8. .
Lakshminarayanan, Balaji, Daniel M Roy, and Yee Whye Teh. 2014. “Mondrian Forests: Efficient Online Random Forests.” In *Advances in Neural Information Processing Systems*, 3140–8.