BAMresearch / probeye

A general framework for setting up parameter estimation problems.
MIT License
5 stars 5 forks source link

Solver interface via accuracy level #61

Closed TTitscher closed 2 years ago

TTitscher commented 2 years ago

Problem: Inhomogenous inputs/outputs

It is hard to come up with an interface for our solver classes as they vary both in output:

and input:

As most of the outputs can be resampled to unweighted samples (and stored in arviz.InferenceData), that issue is almost solved. And the expert user has access to the original results in a Solver.raw_results member.

Proposal: Common accuracy levels

For the input, we could think of some accuracy levels similar to log levels:

The exact values need to be determined in the solver and be somehow well defined.

Possible interfaces

An interface would then look like

class Solver:
   raw_results = None
   def __init__(self, seed:int, show_output:bool, level:int):
   def run(self, **kwargs) --> az.InferenceData

and they would be truly interchangeable

for Solver in [EmceeSolver, DynestySolver, PyroSolver]:
   samples = Solver(seed=0, show_output=True, level=10).run()
   probeye.pair_plot(samples)

and remain customizeable:

samples = EmceeSolver(seed=0, show_output=True, level=10).run(nchains=20, ndraws=10000, ...)

Discussion

joergfunger commented 2 years ago

Why should we try to "hide" the individual setting required for a specific sampler from the user (that relates to the input - not the inference problem, but the solver options such as burn in, number of chains, etc. . I don't see a good option to define different levels of accuracy that work for any problem (e.g. the choice of the burn in phase depends on the dimension of the variables, the priors itself as well as the problem at hand i.e. how nonlinear the model is). As such, I would create all the solvers in your example individually (with the relevant setting defined by the user) and only then loop over them afterwards. The settings defined in the constructor might potentially be overwritten in the run script as shown, though I'm not sure if this is really necessary. The output however seems standard for the sampling procedures, for the variational methods we probably have to think about a different format, since we actually sample (or optimize as in our VB) the hperparameters describing the posterior instead of directly sampling from the posterior.

TTitscher commented 2 years ago

First of all, I have no problems in abandoning the idea.

I like to stress that nothing is explicitly hidden and the user is free to change the parameters. But to follow that wording, I assume that dynesty 'hides' the vol_dec argument or emcee 'hides' the thin_by arg to not expose all options to the user, for convenience. And in probeye, we also provide defaults. So my proposal is more like a collection of reasonable default arguments that hopefully encourage users you try different solvers.

If your concerns regard the implementation, we can put those args in a staticmethod, so within EmceeSolver could be a

@staticmethod
def defaults(level, ndim):
   if level < 1:
      return {"nsteps": 100*ndim, "nburn": 25*ndim, ...} # or ndim**2 ?
   if level < 10:
      return {"nsteps": 1000*ndim**2, ...}

that needs to be called explicitly

solver = EmceeSolver(...).run(Emcee.defaults(10))
joergfunger commented 2 years ago

Don't get me wrong, in general I think it is good to provide default values that the user can change. However, I'm not fully sure how we could reasonably set them (from my experience that is always a trial and error).

TTitscher commented 2 years ago

To summarize conversation with @joergfunger:

aklawonn commented 2 years ago

In the last version (3.0.0) I changed the solver interface so that they all have a run method to call their main functionality.