Closed zsteve closed 2 years ago
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
src/entropic/sinkhorn.jl | 16 | 77.14% | ||
<!-- | Total: | 16 | --> |
Totals | |
---|---|
Change from base Build 1227604199: | -2.7% |
Covered Lines: | 508 |
Relevant Lines: | 554 |
Merging #145 (f4d6474) into master (cc3ab16) will decrease coverage by
9.14%
. The diff coverage is30.00%
.
@@ Coverage Diff @@
## master #145 +/- ##
==========================================
- Coverage 94.42% 85.27% -9.15%
==========================================
Files 12 14 +2
Lines 538 618 +80
==========================================
+ Hits 508 527 +19
- Misses 30 91 +61
Impacted Files | Coverage Δ | |
---|---|---|
src/entropic/sinkhorn_divergence.jl | 0.00% <0.00%> (ø) |
|
src/entropic/sinkhorn_stabilized.jl | 100.00% <ø> (ø) |
|
src/entropic/symmetric.jl | 0.00% <0.00%> (ø) |
|
src/entropic/sinkhorn.jl | 100.00% <100.00%> (ø) |
|
src/entropic/sinkhorn_epsscaling.jl | 95.23% <100.00%> (ø) |
|
src/entropic/sinkhorn_gibbs.jl | 100.00% <100.00%> (ø) |
|
src/entropic/sinkhorn_solve.jl | 100.00% <100.00%> (ø) |
|
src/utils.jl | 93.02% <0.00%> (+4.38%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update cc3ab16...f4d6474. Read the comment docs.
This PR adds
sinkhorn_divergence
due to @davibarreira in PR #92, except I have removed the use ofFiniteDiscreteMeasure
andDiscreteNonParametric
, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain),
sinkhorn_divergence
now takes advantage of the batched Sinkhorn computation.
Thanks, @zsteve, for updating the PR. Why did you remove the use of FiniteDiscreteMeasure
? I have to catch up to the recent changes you guys made.
Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.
I have two major suggestions/requests though:
1. I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems. 2. Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.
@devmotion what is this fixed points iterations? If I remember correctly, the paper of Feydy used Sinkhorn with fixed number of iterations. Is this what you are referring to?
No, I am not referring to fixed number of iterations. They use a fixed point iteration for the symmetric case that is different from the standard Sinkhorn algorithm and exploits the additional structure of the problem.
This PR adds
sinkhorn_divergence
due to @davibarreira in PR #92, except I have removed the use ofFiniteDiscreteMeasure
andDiscreteNonParametric
, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs. Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain),sinkhorn_divergence
now takes advantage of the batched Sinkhorn computation.Thanks, @zsteve, for updating the PR. Why did you remove the use of
FiniteDiscreteMeasure
? I have to catch up to the recent changes you guys made.
I think that was either done earlier (not exactly sure when? or possibly never merged?). For the time being perhaps best to keep as-is (specify vectors of weights, and for empirical measures the locations are encoded in the cost matrix). More than happy to reintroduce FiniteDiscreteMeasure
though, and can then e.g. write wrapper API.
My bad, I think the implementation was DiscreteNonParametric
.
My bad, I think the implementation was
DiscreteNonParametric
. Ah ok. then that's from Distributions.jl right?
My bad, I think the implementation was
DiscreteNonParametric
. Ah ok. then that's from Distributions.jl right?
I'm reading the code again, it has been quite some time. But for now, the way you coded seems fine. I can then submit another PR later if that's the case.
Ok, I think I've worked out the update for the symmetric Sinkhorn (Eq. 25 of Feydy et al.), but in the variables u = exp(f/eps)
.
log(u) <- (log(u) + log(K*(mu*u)))/2
where K
is the Gibbs kernel. This should probably (?) be implemented e.g. as a SymmetricSinkhorn
problem through the solve!
API. What do you think @devmotion @davibarreira
P.S. as part of testing autodiff with sinkhorn_divergence
, added a short example that demonstrates Figure 1 of Feydy et al.
P.S. as part of testing autodiff with
sinkhorn_divergence
, added a short example that demonstrates Figure 1 of Feydy et al.
Just to guarantee that we are in the same page, we are talking about the paper "Interpolating between optimal transport and mmd using sinkhorn divergences", correct?
yes, this one: http://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf p.s. I guess by "demonstrates", it shows the same kind of result. not reproducing the exact figure.
Ok, I think I've worked out the update for the symmetric Sinkhorn (Eq. 25 of Feydy et al.), but in the variables
u = exp(f/eps)
.log(u) <- (log(u) + log(K*(mu*u)))/2
where
K
is the Gibbs kernel. This should probably (?) be implemented e.g. as aSymmetricSinkhorn
problem through thesolve!
API. What do you think @devmotion @davibarreira
I think SymmetricSinkhorn
might be a good approach but I'd suggest not exporting it (I don't think it's very useful as a standalone algorithm). I would assume that it would be numerically more stable to perform the calculations in log space, i.e., to work with f
(or f / eps
) instead of u
(if this is what you suggested). I.e. to use
f <- (f .- eps .* vec(logsumexp(log.(mu) .+ f ./ eps .- C' ./eps; dims=1))) ./ 2
or
g <- (g .- vec(logsumexp(log.(mu) .+ g .- C' ./eps; dims=1))) ./ 2
where g = f / eps
.
Agree with not exporting. In terms of the fixed point iteration, the current sinkhorn_gibbs
algorithm doesn't do computation in log-space. I guess the upshot of that approach is that you can use BLAS for the multiplication with the kernel matrix K
. Probably doesn't hurt to implement both, though.
I've implemented SymmetricSinkhorn
now and changed sinkhorn_divergence
to take advantage of it for the correction terms. Running a few tests with debug output shown reveals that indeed SymmetricSinkhornGibbs converges an order of magnitude fewer iterations than SinkhornGibbs, so thanks @devmotion for pointing that out!
The scheme derived in Feydy et al. uses a reference measure \mu \otimes \nu
, whereas the reference measure used by sinkhorn2
, etc. is \ones \otimes \ones
. From Proposition 4.2 of Computational Optimal Transport (G. Peyre), the choice of reference measure is irrelevant but just amounts to a choice of scaling. In the Sinkhorn divergence formula, switching from one reference measure to another produces terms that cancel out.
Where u = v
are the dual variables used by the standardSinkhornGibbs
, the symmetric-case fixed point iteration scheme of Feydy should be
u <- sqrt.( u .* mu/(K * u))
where u = exp.(f/eps)
and f
is as in Feydy et al. I've done some pen-and-paper checking of the math behind this, but please feel free also to verify this for yourself.
Another thing that I notice is that sinkhorn2
with regularization = true
computes in fact the full transport plan \gamma
. When we have the dual variables (u, v)
, one can in fact compute the loss (= cost + regulariser) by only doing matrix-vector products.
sinkhorn_loss = eps*(dot(u .* log.(u), K*v) + dot(v.*log.(v), K'*u))
When regularizer = true
, this should be a more efficient computation as it avoids allocation of the transport plan matrix. As it is currently implemented, we still incur a vector allocation from K'*u
, since K'*u
is computed throughout the Sinkhorn loop, but its output is not stored in a temp variable. I think it's fine to leave it as-is, since the alternative would be to preallocate another temp variable that's only used once.
Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.
I have two major suggestions/requests though:
- I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems.
This is now addressed by the regularization
argument, which is no longer ignored. The unit tests check for both cases, regularization in (true, false)
- Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.
This is now addressed by the SymmetricSinkhorn
algorithm.
If everything looks good so far, will bump the version and merge once I get a chance. @devmotion any idea what is going on with some of the failing actions? e.g. CI Julia 1/Ubuntu x64 fails with the following:
Run coverallsapp/github-action@master
Using lcov file: lcov.info
Error: Bad response: 405 <html>
<head><title>405 Not Allowed</title></head>
I'll look into it in the next few days since I'm a little busy rn, but wondering if there's an easy fix.
I think coveralls was down for maintenance.
This PR adds
sinkhorn_divergence
due to @davibarreira in PR #92, except I have removed the use ofFiniteDiscreteMeasure
andDiscreteNonParametric
, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain),
sinkhorn_divergence
now takes advantage of the batched Sinkhorn computation.