I think with the current code that the two terms in g_lb_inference have a different scaling. The T.sum reduces dg(L_corr) * log_qz_given_x to a single number, which is then broadcast across all the elements of L, which has dimensions batchsize x eq_samples. So the second term is scaled by 1/ (batchsize * eq_samples), whereas this term cancels in the first term because it is summed that many times
Should the following line,
https://github.com/casperkaae/parmesan/blob/master/examples/vimco.py#L248
actually read
g_lb_inference = T.mean(T.sum(dg(L_corr) * log_qz_given_x, axis=2) + L)
instead of
g_lb_inference = T.mean(T.sum(dg(L_corr) * log_qz_given_x) + L)
?
I think with the current code that the two terms in
g_lb_inference
have a different scaling. TheT.sum
reducesdg(L_corr) * log_qz_given_x
to a single number, which is then broadcast across all the elements ofL
, which has dimensions batchsize x eq_samples. So the second term is scaled by 1/ (batchsize * eq_samples), whereas this term cancels in the first term because it is summed that many times