JuliaOptimalTransport / OptimalTransport.jl

Optimal transport algorithms for Julia
https://juliaoptimaltransport.github.io/OptimalTransport.jl/dev
MIT License
93 stars 8 forks source link

add davibarreira's sinkhorn_divergence with some modifications #145

Closed zsteve closed 2 years ago

zsteve commented 2 years ago

This PR adds sinkhorn_divergence due to @davibarreira in PR #92, except I have removed the use of FiniteDiscreteMeasure and DiscreteNonParametric, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.

Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain), sinkhorn_divergence now takes advantage of the batched Sinkhorn computation.

coveralls commented 2 years ago

Pull Request Test Coverage Report for Build 1229692504

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Files with Coverage Reduction New Missed Lines %
src/entropic/sinkhorn.jl 16 77.14%
<!-- Total: 16 -->
Totals Coverage Status
Change from base Build 1227604199: -2.7%
Covered Lines: 508
Relevant Lines: 554

💛 - Coveralls
codecov-commenter commented 2 years ago

Codecov Report

Merging #145 (f4d6474) into master (cc3ab16) will decrease coverage by 9.14%. The diff coverage is 30.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
- Coverage   94.42%   85.27%   -9.15%     
==========================================
  Files          12       14       +2     
  Lines         538      618      +80     
==========================================
+ Hits          508      527      +19     
- Misses         30       91      +61     
Impacted Files Coverage Δ
src/entropic/sinkhorn_divergence.jl 0.00% <0.00%> (ø)
src/entropic/sinkhorn_stabilized.jl 100.00% <ø> (ø)
src/entropic/symmetric.jl 0.00% <0.00%> (ø)
src/entropic/sinkhorn.jl 100.00% <100.00%> (ø)
src/entropic/sinkhorn_epsscaling.jl 95.23% <100.00%> (ø)
src/entropic/sinkhorn_gibbs.jl 100.00% <100.00%> (ø)
src/entropic/sinkhorn_solve.jl 100.00% <100.00%> (ø)
src/utils.jl 93.02% <0.00%> (+4.38%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cc3ab16...f4d6474. Read the comment docs.

davibarreira commented 2 years ago

This PR adds sinkhorn_divergence due to @davibarreira in PR #92, except I have removed the use of FiniteDiscreteMeasure and DiscreteNonParametric, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.

Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain), sinkhorn_divergence now takes advantage of the batched Sinkhorn computation.

Thanks, @zsteve, for updating the PR. Why did you remove the use of FiniteDiscreteMeasure? I have to catch up to the recent changes you guys made.

davibarreira commented 2 years ago

Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.

I have two major suggestions/requests though:

1. I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems.

2. Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.

@devmotion what is this fixed points iterations? If I remember correctly, the paper of Feydy used Sinkhorn with fixed number of iterations. Is this what you are referring to?

devmotion commented 2 years ago

No, I am not referring to fixed number of iterations. They use a fixed point iteration for the symmetric case that is different from the standard Sinkhorn algorithm and exploits the additional structure of the problem.

zsteve commented 2 years ago

This PR adds sinkhorn_divergence due to @davibarreira in PR #92, except I have removed the use of FiniteDiscreteMeasure and DiscreteNonParametric, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs. Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain), sinkhorn_divergence now takes advantage of the batched Sinkhorn computation.

Thanks, @zsteve, for updating the PR. Why did you remove the use of FiniteDiscreteMeasure? I have to catch up to the recent changes you guys made.

I think that was either done earlier (not exactly sure when? or possibly never merged?). For the time being perhaps best to keep as-is (specify vectors of weights, and for empirical measures the locations are encoded in the cost matrix). More than happy to reintroduce FiniteDiscreteMeasure though, and can then e.g. write wrapper API.

davibarreira commented 2 years ago

My bad, I think the implementation was DiscreteNonParametric.

zsteve commented 2 years ago

My bad, I think the implementation was DiscreteNonParametric. Ah ok. then that's from Distributions.jl right?

davibarreira commented 2 years ago

My bad, I think the implementation was DiscreteNonParametric. Ah ok. then that's from Distributions.jl right?

I'm reading the code again, it has been quite some time. But for now, the way you coded seems fine. I can then submit another PR later if that's the case.

zsteve commented 2 years ago

Ok, I think I've worked out the update for the symmetric Sinkhorn (Eq. 25 of Feydy et al.), but in the variables u = exp(f/eps).

log(u) <- (log(u) + log(K*(mu*u)))/2

where K is the Gibbs kernel. This should probably (?) be implemented e.g. as a SymmetricSinkhorn problem through the solve! API. What do you think @devmotion @davibarreira

zsteve commented 2 years ago

P.S. as part of testing autodiff with sinkhorn_divergence, added a short example that demonstrates Figure 1 of Feydy et al. image

davibarreira commented 2 years ago

P.S. as part of testing autodiff with sinkhorn_divergence, added a short example that demonstrates Figure 1 of Feydy et al. image

Just to guarantee that we are in the same page, we are talking about the paper "Interpolating between optimal transport and mmd using sinkhorn divergences", correct?

zsteve commented 2 years ago

yes, this one: http://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf p.s. I guess by "demonstrates", it shows the same kind of result. not reproducing the exact figure.

devmotion commented 2 years ago

Ok, I think I've worked out the update for the symmetric Sinkhorn (Eq. 25 of Feydy et al.), but in the variables u = exp(f/eps).

log(u) <- (log(u) + log(K*(mu*u)))/2

where K is the Gibbs kernel. This should probably (?) be implemented e.g. as a SymmetricSinkhorn problem through the solve! API. What do you think @devmotion @davibarreira

I think SymmetricSinkhorn might be a good approach but I'd suggest not exporting it (I don't think it's very useful as a standalone algorithm). I would assume that it would be numerically more stable to perform the calculations in log space, i.e., to work with f (or f / eps) instead of u (if this is what you suggested). I.e. to use

f <- (f .- eps .* vec(logsumexp(log.(mu) .+ f ./ eps .- C' ./eps; dims=1))) ./ 2

or

g <- (g .- vec(logsumexp(log.(mu) .+ g .- C' ./eps; dims=1))) ./ 2

where g = f / eps.

zsteve commented 2 years ago

Agree with not exporting. In terms of the fixed point iteration, the current sinkhorn_gibbs algorithm doesn't do computation in log-space. I guess the upshot of that approach is that you can use BLAS for the multiplication with the kernel matrix K. Probably doesn't hurt to implement both, though.

zsteve commented 2 years ago

I've implemented SymmetricSinkhorn now and changed sinkhorn_divergence to take advantage of it for the correction terms. Running a few tests with debug output shown reveals that indeed SymmetricSinkhornGibbs converges an order of magnitude fewer iterations than SinkhornGibbs, so thanks @devmotion for pointing that out!

The scheme derived in Feydy et al. uses a reference measure \mu \otimes \nu, whereas the reference measure used by sinkhorn2, etc. is \ones \otimes \ones. From Proposition 4.2 of Computational Optimal Transport (G. Peyre), the choice of reference measure is irrelevant but just amounts to a choice of scaling. In the Sinkhorn divergence formula, switching from one reference measure to another produces terms that cancel out.

Where u = v are the dual variables used by the standardSinkhornGibbs, the symmetric-case fixed point iteration scheme of Feydy should be

u <- sqrt.( u .* mu/(K * u))

where u = exp.(f/eps) and f is as in Feydy et al. I've done some pen-and-paper checking of the math behind this, but please feel free also to verify this for yourself.

Another thing that I notice is that sinkhorn2 with regularization = true computes in fact the full transport plan \gamma. When we have the dual variables (u, v), one can in fact compute the loss (= cost + regulariser) by only doing matrix-vector products.

sinkhorn_loss = eps*(dot(u .* log.(u), K*v) + dot(v.*log.(v), K'*u))

When regularizer = true, this should be a more efficient computation as it avoids allocation of the transport plan matrix. As it is currently implemented, we still incur a vector allocation from K'*u, since K'*u is computed throughout the Sinkhorn loop, but its output is not stored in a temp variable. I think it's fine to leave it as-is, since the alternative would be to preallocate another temp variable that's only used once.

zsteve commented 2 years ago

Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.

I have two major suggestions/requests though:

  1. I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems.

This is now addressed by the regularization argument, which is no longer ignored. The unit tests check for both cases, regularization in (true, false)

  1. Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.

This is now addressed by the SymmetricSinkhorn algorithm.

zsteve commented 2 years ago

If everything looks good so far, will bump the version and merge once I get a chance. @devmotion any idea what is going on with some of the failing actions? e.g. CI Julia 1/Ubuntu x64 fails with the following:

Run coverallsapp/github-action@master
Using lcov file: lcov.info
Error: Bad response: 405 <html>
<head><title>405 Not Allowed</title></head>

I'll look into it in the next few days since I'm a little busy rn, but wondering if there's an easy fix.

devmotion commented 2 years ago

I think coveralls was down for maintenance.