UT-CHG / BET

Python package for data-consistent stochastic inverse and forward problems.
http://ut-chg.github.io/BET
Other
11 stars 21 forks source link

New features for estimating observed and predicted distributions (other than KDE) #400

Open yentyu opened 3 years ago

yentyu commented 3 years ago

@eecsu @smattis We would like to add different options for how the observed and predicted distributions are estimated (i.e., with Bayesian GMM) when calculating R in the data-consistent framework. In order to accomplish this in a fashion which allows for future development, I've come up with an outline of how these features can be added to the code in a smooth and consistent way. @eecsu and I just wanted to start a discussion to make sure the general approach seems reasonable and we aren't overlooking anything that might break things or cause big problems with the package.

Here's the general idea. Currently, the discretization object points to 3 different sample_set_base objects: an input (or initial), a output (or predicted), and an observed. We want to use these objects to save the estimated pdfs for observed and predicted to their respective sample_set_base object using the _prob_type and _prob_parameters and then call the base object's evaluate_pdf method when computing R. This has some nice benefits:

  1. It makes it easier to expand the different options for estimating pdfs by adding new _prob_type to the base object
  2. It cuts down on repeated computations for situations when you want to do a batch of inversions with the same predicted but different observed distributions (because the predicted distribution is only computed once)

Here is the specific outline of the changes this would involve:

Step 1: Update sample_set_base object

Step 2: Create generate_densities function

Step 3: Change invert_... Methods

smattis commented 3 years ago

Hey @yentyu. Thanks for proposing this. I totally agree that this change that you are proposing is the best way to do things. @eecsu and I had discussions about this back in the summer. I hardcoded the method of using KDEs on the full space to calculate the ratios, because that was the main way that we were doing it in our work. I agree that it should instead be done using evaluate_pdf and should let you use any type of probability format (KDE, scipy.stats.random_variable, GMM, etc). This is something that I have been wanting to do, but have not had the time due to my job and baby, so I am very glad that it is something that you want to take on. I will try to help as best I can. I will give some thoughts to your proposed steps below.

smattis commented 3 years ago

Step 1: I don't think we want to fully remove the ability to have the probability described as a product of marginals, where each marginal is a kde. This especially may be useful in problems with high-dimensional parameter spaces. Perhaps that this could be renamed kde_marginals.

I totally agree that we should have a proper kde representation to go along with that.

smattis commented 3 years ago

Step 2: Yes, this should be done. I think I am mostly in agreement with the way you want to implement this. We also want to keep the ability to do a GMM, where the mixture is based on Gaussians coming from the different clusters (which is how gmm is being used now.)

smattis commented 3 years ago

Step 3:

Most current inversion methods call generate_output_kdes and then call invert. But invert ALSO calls generate_output_kdes. >This seems like unnecessary computation. We should remove the call to generate_output_kdes except in the invert function.

Seeing your comment, this is actually a bug that I did not catch as I was modifying the way the invert_* methods worked. You could write a PR to fix that now. I agree that once generate_densities is created, that all of the invert methods should be changed as you say.