Open HEmile opened 4 years ago
We can fit this somewhat naturally in the DiCE formulation as follows: Use as the multiplicative estimator the following: $\thetai \cdot \bot(c{\theta_i} \cdot \frac{p^+(x)}{p(x)})$ for samples from the positive component, and $-\thetai \cdot \bot(c{\theta_i} \cdot \frac{p^-(x)}{p(x)})$ for samples of the negative component. This allows us to compensate for the importance weighting used in the rest of the calculation.
The compensated weight can be implemented by simply taking the inverse of the weight.
Measure valued derivatives are an alternative to REINFORCE/score function. See https://arxiv.org/pdf/1906.10652.pdf for a clear explanation.
It has some problems when implementing it, though! Samples are taken using the positive and negative probability components. This means that blindly applying MC won't work for downstream estimation: It's not taken from the original distribution. We can easily fix this by importance sampling using the weighting function. Furthermore, to make it compatible with auto-diff, a solution could be: $\sum_i \thetai \bot(c{\theta_i}(f(x_1)-f(x_2))$, where $x_1 \sim p^+$ and $x_2\sim p^-$.