PythonOT / POT

POT : Python Optimal Transport
https://PythonOT.github.io/
MIT License
2.39k stars 497 forks source link

Instability in Sinkhorn Knopp converge #321

Open sarahboufelja opened 2 years ago

sarahboufelja commented 2 years ago

Describe the bug

I am using the Sinkhorn and Sinkhorn with Group Lasso implementations in OT package to reproduce the results in this paper: "Optimal Transport for Domain Adaptation", by Nicolas Courty et al. However, if I run the same following code for a few times, I get inconsistent convergence results:

ot.da.SinkhornLpl1Transport(reg_e = 10, reg_cl = 1e0) 
ot.da.SinkhornTransport(reg_e = 100)

sometimes the same code on the same data, converges with no errors and sometimes the algorithm fails to converge. I did try different Reg rates but I don't want to increase the rate significantly, as ths would obviously lead to a uniform mapping. Is this a known convergence issue with the Sinkhorn implementation ? How to choose the right the reg. rate in your opinion? With Cross-validation?

To Reproduce

Steps to reproduce the behavior:

  1. Extract the Decaf Features from the 6th and 7th layer in the pre-trained AlexNet, for both MNIST and USPS data
  2. Use: ot.da.SinkhornLpl1Transport(reg_e = 10, reg_cl = 1e0) and ot.da.SinkhornTransport(reg_e = 100) to process the optimal mapping between MNIST and USPS.
  3. Change the regularisation rate from 1e-3 to 100
  4. Run each experiment 10 times to assess the consistency of convergence.

Screenshots

Code sample

Expected behavior

I ma expecting the Sinkhorn algorithm to consistently converge to the optimal coupling.

Environment (please complete the following information):

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

Additional context

ncourty commented 2 years ago

sorry for long time reply. Yes, regularization values should be chosen with care, as it depends closely to the nature of data(and cost) at hand. Yet, the behavior should be consistent between several runs with exactly the same data and regularization parameters. If it is not the case, could you set up a small running example, that does not require extra steps or data, we could work upon ?