about the tractable estimator

Hi, in the paper, you make a proof that when T goes to infinity, the estimate of conditional mutual information approaches to the real value of conditional mutual information of output y and parameters w. I wonder that why is this necessary? If I can derive an equation which is a proportional of conditional mutual information, can I use it to measure the uncertainty in the view of BALD? Why or why not? Thanks!

Riashat / Deep-Bayesian-Active-Learning

about the tractable estimator #8