facebookresearch / beanmachine

A library that allows for inference on probabilistic models
https://beanmachine.org/
MIT License
264 stars 49 forks source link

Fix bug in Marginal 1D tool & serialization #1759

Closed ndmlny-qs closed 1 year ago

ndmlny-qs commented 1 year ago

Motivation

Changes proposed

Test Plan

More tutorials were used to visually inspect the output of the tool.

Types of changes

Checklist

facebook-github-bot commented 1 year ago

@horizon-blue has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ndmlny-qs commented 1 year ago

@horizon-blue I like your idea, restated here for context.

...we should find a way to let [the] user [index] into the dimensions [of a random variable]...

This is a great idea, and I think I can explore that in this PR. This pattern comes up in several tutorials, and should be something that the user is capable of doing. It will require a change to the UI, but the added functionality adds a lot to the UX. Especially when a model has hundreds of indexed values for a single random variable.

horizon-blue commented 1 year ago

This is a great idea, and I think I can explore that in this PR.

Glad to hear that you like this idea (and thanks for correcting the many grammar errors that I made :) ), though we don't necessarily have to get this done in this PR. Since this PR is about fixing a bug that impacts the usability of the tool, it's better for us to get a workable version soon. After the bug is fixed, we can then work on UX improvements freely.

horizon-blue commented 1 year ago

Another thing that we should be aware of is that the MonteCarloSamples object has a get_chain() method that returns a narrower version of the MonteCarloSamples, where each value no longer has the leading chain dimension. With the change here we will run into error if people try to run samples.get_chain(0).diagnostics.marginal1d().

Which leads me to think that it might be easier to convert a MonteCarloSamples object to an arviz.InferenceData object before serializing it, because the underlying xarray dataset has better support for indexing and slicing operators. It shouldn't require too much change here. Plus, people can convert samples from other PPL to an InferenceData and manually invoke the diagnostic widgets on it, if they ever want to do so.

ndmlny-qs commented 1 year ago

I'm going to split this PR into two:

  1. the bug fix for the mean calculation
  2. serialization updates

I think the split will allow for more focused discussion about the serializer, and I can focus on writing tests for it in that branch.

facebook-github-bot commented 1 year ago

@ndmlny-qs has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 year ago

@horizon-blue has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.