Fix bug in Marginal 1D tool & serialization

facebookresearch / beanmachine

A library that allows for inference on probabilistic models

https://beanmachine.org/

MIT License

264 stars 49 forks source link

Fix bug in Marginal 1D tool & serialization #1759

Closed ndmlny-qs closed 1 year ago

ndmlny-qs commented 1 year ago

Motivation

Fixes incorrect mean value calculation in the tool.
Updates how serialization is performed to catch indexed random variables.

Changes proposed

Removes console logging from the Marginal1d tool, as it is not necessary.
Updates how a Bean Machine model is serialized. Previously, we were not capturing higher dimensions of random variables, which can happen when a random variable has a dummy index.
Fixes a bug where we calculated the mean of the given 1D KDE estimate of a random variable, instead of calculating the mean of the actual raw data.

Test Plan

More tutorials were used to visually inspect the output of the tool.

Coin flipping.
Robust linear regression.
Logistic regression.
Sparse logistic regression.

Types of changes

[ ] Docs change / refactoring / dependency upgrade
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist

[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[x] I have read the CONTRIBUTING document.
[ ] I have added tests to cover my changes.
[ ] All new and existing tests passed.
[x] The title of my pull request is a short description of the requested changes.

facebook-github-bot commented 1 year ago

@horizon-blue has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ndmlny-qs commented 1 year ago

@horizon-blue I like your idea, restated here for context.

...we should find a way to let [the] user [index] into the dimensions [of a random variable]...

This is a great idea, and I think I can explore that in this PR. This pattern comes up in several tutorials, and should be something that the user is capable of doing. It will require a change to the UI, but the added functionality adds a lot to the UX. Especially when a model has hundreds of indexed values for a single random variable.

horizon-blue commented 1 year ago

This is a great idea, and I think I can explore that in this PR.

Glad to hear that you like this idea (and thanks for correcting the many grammar errors that I made :) ), though we don't necessarily have to get this done in this PR. Since this PR is about fixing a bug that impacts the usability of the tool, it's better for us to get a workable version soon. After the bug is fixed, we can then work on UX improvements freely.

horizon-blue commented 1 year ago

Another thing that we should be aware of is that the MonteCarloSamples object has a get_chain() method that returns a narrower version of the MonteCarloSamples, where each value no longer has the leading chain dimension. With the change here we will run into error if people try to run samples.get_chain(0).diagnostics.marginal1d().

Which leads me to think that it might be easier to convert a MonteCarloSamples object to an arviz.InferenceData object before serializing it, because the underlying xarray dataset has better support for indexing and slicing operators. It shouldn't require too much change here. Plus, people can convert samples from other PPL to an InferenceData and manually invoke the diagnostic widgets on it, if they ever want to do so.

ndmlny-qs commented 1 year ago

I'm going to split this PR into two:

the bug fix for the mean calculation
serialization updates

I think the split will allow for more focused discussion about the serializer, and I can focus on writing tests for it in that branch.

facebook-github-bot commented 1 year ago

@ndmlny-qs has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@horizon-blue has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.