JuliaDecisionFocusedLearning / InferOpt.jl

Combinatorial optimization layers for machine learning pipelines
https://juliadecisionfocusedlearning.github.io/InferOpt.jl/
MIT License
115 stars 4 forks source link

Add stochastic softmax tricks #48

Open gdalle opened 1 year ago

gdalle commented 1 year ago

https://arxiv.org/abs/2006.08063

gdalle commented 1 year ago

Currently reading their paper. The two main ideas are 1) representing a probability distribution on a polytope as the pushforward through an argmax of a simpler probability distribution on a cost 2) relaxing the argmax into a softmax to make it differentiable

IMO the main difference from InferOpt is that from the start they want to differentiate an expectation $\int \text{oracle}(U) p(U | \theta) dU$, whereas we want to differentiate a deterministic $\text{oracle}(\theta)$ and only use perturbation $U \sim p(\cdot | \theta)$ as an approximation.

The most superficially similar work is [Berthet et al.], which uses noisy utilities to smooth the solutions of linear programs. In [Berthet et al.] the noise is a tool for approximately relaxing a deterministic linear program. Our framework uses relaxations to approximate stochastic linear programs.

Their approach relies on being able to differentiate through a regularized optimizer, which as they point out is not easy. In their supplementary material they suggest either unrolling, a "custom backward pass" (not sure what that is) or finite differences. Implicit differentiation of the convex solvier through KKT or the Frank-Wolfe trick is another possible method.

Basically, I think once #80 is merged, we'll be able to reimplement all of the specific examples they list in their paper by putting a Regularized inside a Perturbed as the oracle

gdalle commented 1 year ago

So the question we need to ask here is: in our setting, if we assume we are able to differentiate through a regularized optimizer, why should we add a perturbation in front of it?