cdslaborg / paramonte

ParaMonte: Parallel Monte Carlo and Machine Learning Library for Python, MATLAB, Fortran, C++, C.
https://www.cdslab.org/paramonte
Other
273 stars 33 forks source link

Priors and Hyperpriors #28

Open arwelHughes opened 1 year ago

arwelHughes commented 1 year ago

Could we have a general discussion / examples of how one might implement priors and hyperpriors in the objective function somewhere in the documentation?

shahmoradi commented 1 year ago

@arwelHughes Thanks for the feedback. Good idea. It may take some time to add to the documentation as people are currently focused on getting the next release out, which has been long overdue. Based on my experience, I recommend merging priors with the likelihood function to create a single objective function to explore/sample. This makes sampling much easier. Although theoretically, prior should facilitate sampling, in practice, they are frequently objective/non-informative, meaning that they do not add much information and help to simplify navigating the parameter space. But now that this has been brought up, we will consider it more carefully to gauge the benefits of explicit prior specification. Hyperpriors and hyperparameters typically appear in hierarchical modeling problems and require multi-layered sampling strategies. If you have specific examples where this issue appears, please share them with us to hopefully develop a better solution than what can be currently done.

pglpm commented 1 year ago

May I intrude in this ticket to ask a somewhat related question: what kind of distribution domains can Paramonte handle? It doesn't seem clear from the help & manual pages. For example, can it deal with a distribution defined on the Cartesian product of the real line, the integers, and a discrete (nominal/categorical) set?

Cheers and thank you for the great project!

fagheri commented 1 year ago

@pglpm Thank you for your inquiry, and I apologize for the delayed response, as some of us have been traveling. Cartesian is the only one supported in the current release. We have had internal discussions of adding support for discrete domains that have so far not gained traction as we did not need it, and no one has asked for it. If this interests you, I'd appreciate it if you create an issue dedicated to this request to materialize the need and justify the effort to add it. We have been working on a new major release that has been delayed for too long, but hopefully not much longer. Thank you for your encouraging words! May I ask what sort of problem nature you are trying to solve? And what programming language?

pglpm commented 1 year ago

@fagheri Thank you for the info! Then I'll create an issue ticket. Looking forward to the new release.

I've been working on "nonparametric density inference" for inference problems in neuroscience and medicine. In medicine it often happens that the variates are a combination of nominal, ordinal, interval types (psychometric tests, family attributes, clinical tests, and similar). Ordinal variates can be dealt with with a continuous distribution, by using latent variables. But nominal variates really require a sampler that works on discrete spaces.

I'm trying to build a package for nonparametric density inference with this general kind of data, especially for use in medicine. The goal is to make it as user-friendly and transparent as possible (the pre-package is available here, an example application is here). I'm working in R and the Monte Carlo sampling is taken care of by the package Nimble. But eventually I'd like to write something directly in Fortran. This is why I'm interested in Paramonte.

Peku995 commented 1 year ago

I would also be very interested in this. This would be especially useful if it would be possible to set some variables within the same simulation as integers and others as continuous.

fagheri commented 1 year ago

Thanks again for your feedback @Peku995 and @pglpm ! Exciting work. I am sure @shahmoradi and Co would also be interested in helping and learning from this project. A new issue, perhaps with some details on a specific problem or pointers to such examples of interest (as well as any contributed by @Peku995), would be a great starting point for me on this new functionality and the interface it requires. Looking forward to it! p.s. The new release of ParaMonte will be accessible from R as well. Getting all components in place for all languages has significantly delayed the release so far. But we are very near the end now; stay tuned!

Peku995 commented 1 year ago

A concrete use case for me would be a scattered light simulation where a sphere size distribution scatters the light. The calculation of the Mie scattering is time-consuming, especially when distributions are involved. The idea now is to calculate the scattering for discrete sphere sizes beforehand via a database. In the simulation, the integral scatter signal, e.g. normally distributed, can then simply be calculated via the database. However, of course, only discrete sphere sizes are available in the database, which is why only these discrete values should be used for calculating the distribution. This could be solved using this discrete approach in Paramonte. If you now have a superposition of sphere scattering with scattering of fractal aggregates, you can calculate the scattering of the aggregates relatively easily directly in the simulation, for this the variables in Paramonte would then have to be continuous. At the same time, however, I would need the discrete variables in the same simulation for the spherical distribution for the database.

arwelHughes commented 1 year ago

I can imagine a similar use-case, where we have sets of Molecular Dynamics simulations of biomembranes at say, different pressures or salt concentrations. From these we build the model to fit our data, but they are also a discreet set (which could be in a database), but the final model is constructed of continuous functions. But, I guess the way of approaching this classically would be to use Bayesian model selection (i.e. nested sampling - hint! ;) ) to choose between the possibilities using a set of fits....