Toy examples for BNN - Githubissues

dnguyen1196 commented 4 years ago

I think it helps to have simple toy examples where we can visualize the weights learned by the inference algorithms (for example in 2D). This is slightly different from the spiral example in scripts/adlala_experiments/adlala_spiral.ipynb where it plots the prediction but not the distribution of the learned weights.

This will help us build intuition about what the algorithm is doing (if we can directly see its trajectory). We can use these simplified versions to test/debug new algorithms before deploying on full-scale graph neural networks. We can also devise more complicated 2D distribution to see how the inference algorithms do.

My suggestion is:

As much as possible, use the same inference implementations. I think Yuanqing's implementations of the inference algorithms are already very modular and general. However, the neural network can be something as simple as just a single layer with a R^2 weight and no bias. Then, we can plot the distribution of the weight parameters.

Perhaps, can place this in app/

maxentile commented 4 years ago

Great idea! You're on a roll. I like as much as possible to have 1D or 2D examples for visualization and exploration.

I think it helps to have simple toy examples where we can visualize the weights learned by the inference algorithms (for example in 2D). This is slightly different from the spiral example in scripts/adlala_experiments/adlala_spiral.ipynb where it plots the prediction but not the distribution of the learned weights.

Could you elaborate a bit more how to get a 2D representation of a model's parameters? (Would the model itself have just 2 parameters, or perhaps many pairs of related parameters (which could be represented as a collection of 2D points), or perhaps have a large vector of parameters which is projected down to a single 2D point?)

dnguyen1196 commented 4 years ago

Could you elaborate a bit more how to get a 2D representation of a model's parameters? (Would the model itself have just 2 parameters

Yeah that's exactly what I meant, so the neural network/ linear classifier/regression just has two parameters. I'm not sure about the case when we have more than 2 parameters, something like you said might work (groups of points, embedding a n-dimension vector to 2d) but that will take more work to figure out how to "embed" the points from n-dimension to 2 meaningfully.

maxentile commented 4 years ago

Sounds good! Plotting results for 2-parameter model like linear regression in 1D should be highly diagnostic.

An approach that may be interesting for us is [Izmailov et al., 2019] "Subspace inference for Bayesian deep learning" . There, the neural network model may have oodles of parameters, but we only try to run MCMC in a manageably small parameter subspace that hopefully contains a diverse enough set of models (the authors include an example of a 2D subspace).

karalets commented 4 years ago

Josh, some comments on this paper: it does not work as you may wish.

The authors have to train their models with maximum likelihood and then do some ad hoc stuff that they call Bayesian Inference, but they can by no means just specify their model and do inference and get a result as you seem to imagine. Whereas my paper that they cite ( https://arxiv.org/pdf/1810.00555.pdf ), it actually works with a clean prior and does inference as one might wish, but it is pretty slow and I'd say 'early days' research compared to more vanilla techniques.

Similarly, Latent Projection BNNs: Avoiding weight-space pathologies by learning latent representations of neural network weights by Melanie (which also cites my paper and I like much more than the one you mention) is training an autoencoder and then captures weight spaces, and that is more appealing than the other paper you mention as well, but also practically not without challenges to be useful in a general setting.

In short: I would not advise for any of these methods anytime soon, this stuff is all still too experimental for purely empirical work, it is deep learning research, not application oriented research. You are making the assumption that hierarchical "low-d" representations of NNs are solved to a degree that we should do biology/chemistry with them, and they are just not and thus using them will set someone back by several months before getting results.

About the 2d plots Duc thinks about: I made a bunch of such plots in past papers, i.e. here: https://arxiv.org/pdf/1810.00555.pd https://arxiv.org/pdf/1810.00555.pdf Figure 3.

It is quite intuitive to plot predictive distributions over such things.

So my advice is to -as I have been trying to argue for here- be systematic and build infrastructure and try reliable basics before doing fancy schmancy stuff that has not yet been 'commoditized' and then when we have solid footing also try those things if other stuff fails. I feel you are about to run into a rabbit hole of cutting edge DL-research papers that possibly won't work very predictably in this setting before having covered more predictable basics.

On Mon, May 11, 2020 at 4:19 PM Josh Fass notifications@github.com wrote:

Sounds good! Plotting results for 2-parameter model like linear regression in 1D should be highly diagnostic.

An approach that may be interesting for us is [Izmailov et al., 2019] "Subspace inference for Bayesian deep learning" http://auai.org/uai2019/proceedings/papers/435.pdf. There, the neural network model may have oodles of parameters, but we only try to run MCMC in a manageably small parameter subspace that hopefully contains a diverse enough set of models (the authors include an example of a 2D subspace).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choderalab/pinot/issues/22#issuecomment-627016237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKWCL7CI2IG5SPCVRU54L3RRCBY7ANCNFSM4M6H4UPA .

maxentile commented 4 years ago

Thanks for these comments and references! Your points are generally well-taken, and I agree with your point about building up from the simplest things with the most predictable performance.

I have my own reservations about work in this area. I didn't mean to imply an expectation of usefulness for applications, but to mention an option to get a 2D parameter space to sample and plot. Random projection sounded like a simple way to get a 2D parameter-sampling problem that exercises the same machinery as the original problem, for testing and visualization purposes. I'll take a closer look at how to recreate something like Figure 3 in https://arxiv.org/abs/1810.00555 .

Btw, late next week @yuanqing-wang will lead a journal club discussion on Bayesian treatments of neural networks. Perhaps before then, we should pick your brain for further references and perspectives about this area!

choderalab / pinot

Toy examples for BNN #22