Open gdalle opened 1 year ago
Currently reading their paper. The two main ideas are 1) representing a probability distribution on a polytope as the pushforward through an argmax of a simpler probability distribution on a cost 2) relaxing the argmax into a softmax to make it differentiable
IMO the main difference from InferOpt is that from the start they want to differentiate an expectation $\int \text{oracle}(U) p(U | \theta) dU$, whereas we want to differentiate a deterministic $\text{oracle}(\theta)$ and only use perturbation $U \sim p(\cdot | \theta)$ as an approximation.
The most superficially similar work is [Berthet et al.], which uses noisy utilities to smooth the solutions of linear programs. In [Berthet et al.] the noise is a tool for approximately relaxing a deterministic linear program. Our framework uses relaxations to approximate stochastic linear programs.
Their approach relies on being able to differentiate through a regularized optimizer, which as they point out is not easy. In their supplementary material they suggest either unrolling, a "custom backward pass" (not sure what that is) or finite differences. Implicit differentiation of the convex solvier through KKT or the Frank-Wolfe trick is another possible method.
Basically, I think once #80 is merged, we'll be able to reimplement all of the specific examples they list in their paper by putting a Regularized
inside a Perturbed
as the oracle
So the question we need to ask here is: in our setting, if we assume we are able to differentiate through a regularized optimizer, why should we add a perturbation in front of it?
https://arxiv.org/abs/2006.08063