Open dustinvtran opened 8 years ago
I was browsing the Stan-users mailing list at 3am in the morning. Someone implemented a Bayesian neural network in Stan(!). There are two different implementations available: [1], [2a, 2b], discussion here
We should include one of these as a model example, and also implement the same model in TensorFlow (which is very easy).
@dustinvtran I want to contribute to Edward and feel that working on this issue will help me more as well. Do you have any suggestions as I go forward? I'm using this guide for the dev workflow.
perfect! i think a good way to start is maybe to select one of the models you're interested in above and then to implement a working version of it using some data and edward's inference.
@dustinvtran I'm thinking of starting with 'Poisson process' next. Do you have any references that might help me? I'm not familiar with many variational inference models.
@siddharth-agrawal: Cool. You can try building off https://github.com/blei-lab/edward/pull/294, which I think @diengadji was working on but is no longer(?). It defines a Cox process for spatial data.
You can look into using the same example or apply a simpler version (e.g., Poisson vs Cox, i.i.d. data instead). I also recommend using a real data set, perhaps one typically applied with Poisson processes. For a more vanilla Poisson process, I also recommend point estimation via MAP instead of VI.
@dustinvtran After I play a bit more with Edward and look through the code in more depth, I'd like to have a go at implementing a markov random field. Aware of anyone working on undirected models?
Undirected is difficult. We haven't focused on it, although there is limited support such as for undirected models that can be conditionally specified (see https://github.com/mariru/exponential_family_embeddings).
We don't really know how to expose an undirected model's graph structure in the computational graph. But that would be the first step.
Please use gibbs sampling as inference method
@dustinvtran I'd be happy to take a stab at PMF using the original netflix ratings. Is anyone else on this already?
Ah, just saw https://github.com/blei-lab/edward/pull/557/files / https://github.com/blei-lab/edward/blob/master/examples/probabilistic_matrix_factorization.py from @siddharth-agrawal
Still happy to make a notebook using the netflix data in / notebooks
Cool, I am looking forward to Edward’s performance on a public dataset! BTW, Tsinghua University has release a similar python library using tensorflow as backend — zhusuan(https://github.com/thu-ml/zhusuan https://github.com/thu-ml/zhusuan) , anyone compared it with Edward?
在 2017年6月9日,下午12:27,Patrick notifications@github.com 写道:
Ah, just saw https://github.com/blei-lab/edward/pull/557/files https://github.com/blei-lab/edward/pull/557/files / https://github.com/blei-lab/edward/blob/master/examples/probabilistic_matrix_factorization.py https://github.com/blei-lab/edward/blob/master/examples/probabilistic_matrix_factorization.py from @siddharth-agrawal https://github.com/siddharth-agrawal Still happy to make a notebook using the netflix data in / notebooks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blei-lab/edward/issues/22#issuecomment-307292212, or mute the thread https://github.com/notifications/unsubscribe-auth/AMnnXwMO8US-bUTKpKGMVrlKYp2RxPGBks5sCMmtgaJpZM4HhEaQ.
I've been trying to get vanilla PCF working on the larger movie lens dataset (20 million ratings, ~20k movies ~130k users) without much luck.
https://github.com/blei-lab/edward/pull/682
Would greatly appreciate any tips!
@patrickeganfoley i looked your notebook. a couple suggestions:
start simple: use a MAP point estimate rather than going full variational. under coordinate ascent setting, for Gaussian MF, MAP point estimate matches variational mean, and you won't get much from the (under-estimated) approximating variances anyway.
start simple 2: use a global regularization rather than per-user/item, especially when fitting only on observed ratings -- the data is actually very sparse and probably cannot afford giving you reasonable estimate on a per-user/item basis.
(Updated)
K = 2
is too simplistic. I would at least start with K = 5
. Thanks @dawenl ! Will try MAP now.
on 2 - I'm not sure if I completely understand. I am pretty sure I am only setting up 4 regularization terms (user offset, movie offset, user vec, movie vec) which doesn't seem to be too much to ask of 20m ratings.
OK, I should have looked more closely. I thought sigma_user_betas
and sigma_movie_betas
are set up on a per-user/item basis.
@dawenl thank you for your help! (discussing here https://github.com/blei-lab/edward/pull/682 so as to not block discussion of other examples)
@dustinvtran I would like to start working on Bayesian word embedding model. Do you have any references on how to implement a variational inference model ? I saw the LSTM language model in Edward examples. Thanks.
@iliemihai I know @mariru was extending https://github.com/mariru/exponential_family_embeddings to work on probabilistic embedding models with variational inference. In particular, normal priors over embedding and context vectors, and normal variational distributions to approximate their posteriors. It could be useful to start from there. (alternatively, if you want just Bayesian RNNs, you can take the LSTM language model and replace MAP)
@iliemihai check out the branch feature/edward in the https://github.com/mariru/exponential_family_embeddings repo. It uses edward's MAP inference
It'd be great to have high profile model examples which we can highlight on the front page. Some ideas:
We can think about the choice of inference algorithm and data for the above later.