A pure numpy implementation of the network simplex algorithm, `ot.emd(a,b,M)`

AjayTalati commented 7 years ago

Dear Remi,

thank you very much for releasing and documenting this package - it's really helpful to learn from :+1: I was wondering if there's a simpler/more explicit way to learn the network simplex algorithm?

I was looking on the web for very simple 1D emd code to help to compare the number of computational steps and accuracy of the unregularized linear program algorithm, with the regularized Sinkhorn-Knopp algorithms which you have here in pure numpy.

I could only find MATLAB code though, and I don't have access to MATLAB? I tried converting it to Octave, but the linear programming solver in Octave seems to be different to the MATLAB one, and I could'nt get the same values as ot.emd(a,b,M). I tried both Gaussians, and simpler discrete distributions, but I couldn't find the problem?

It would be really great, and help my understanding a lot, if I could find some simple numpy code to calculate the emd by setting up the linear program as the network simplex algorithm, as you do in EMD_wrapper.cpp. I'm trying to do this with, linprog-simplex?

I just wondered if you know of any such code which is available? It would be really helpful for people new to optimal transport to see how the different algorithms work, (side by side in numpy), and compare their accuracy at a basic level.

All the best,

Ajay

AjayTalati commented 7 years ago

Hi Remi,

up date :+1: , it seems that R has a wrapper for the linear/integer program solver, lp_solve, and this can be used to calculate the emd, here's some working code,

https://gist.github.com/satojkovic/4979391

I'm guessing that because this code uses integer linear programming, (rather than linear programming), it's probably closer to the network simplex algorithm you use?

Best,

Ajay

rflamary commented 7 years ago

Hello Ajay,

I see why you would be interested by a pure python implementation of EMD but it would clearly lead to bad performance especially compared to the C++ version we use in POT. We already experimented with the lp solver in cvxopt in the early version of POT available here where you can see the LP EMD in the function computeTransportLP. Note that the computational time is really long compared to the network flow.

The code you reference above use standard linear programming and (no integer programming unless i'm mistaken). The efficiency of the solution will depend on the solver since some lp solvers can perform network flow, in this case the algorithm will be the same.

Best,

Rémi

AjayTalati commented 7 years ago

Hi Remi,

thank you very much for your helpful reply - the simple code that you linked is perfect for my level of understanding :+1:

At the moment I'm just working with very simple 1D histograms with 8 bins, to check my understanding of the different algorithms, basically in the very simplest cases like this diagram,

simple_flow_disagram

For such a small problem, it seems rather silly to call a high powered LP solver, so at the moment I just want to implement a small simple network simplex solver to help my understanding.

The the R package of lpSolve has a function lp.transport, which I believe implements integer programming, according to the docs, (on page 7),

Integer Programming for the Transportation Problem

again though, this is nice for checking, but doesn't really help to understand the algorithm, for purely education purposes in such a small case?

Thank you once again for making the package available and the clear documentation. I was working with a friend over the weekend, and he's coded the versions of the SK algorithm into PyTorch. If you're interested it might be possible to quickly understand the implementation from this forum thread,

Wasserstein Loss Layer

Perhaps it might be useful to you? It should now be possible to reproduce Marco Cuturi's original MNIST experiments? That would be quite educational :+1:

I'll try to tidy up my work and post it as a gist, as hopefully it will helpful to others who are quite new to computing optimal transport. Perhaps it would be complementary to the Gaussian examples that are currently in the POT library?

Best,

Ajay

rflamary commented 7 years ago

Hello Ajay and thank you for the feedback.

Indeed the R solver handles integer which is nice but since interger programming is more complex than linear programming I think it might be less efficient than a classical float lp solver (again to be tested on large data).

Good luck with implementing a Sinkhorn Wasserstein layer in PyTorch, this can be fun and have several applications in the future. I wlll follow with interest the thread on PyTorch.

If you are interested, I recently added to POT a Wassrestein dimensionnality reduction method that requires the gradient of Sinkhorn loss which was handled by autograd and seems very efficient (at least far more than the initial Matlab implementation we had). I don't doubt PyTorch.autograd would do a good job but I was surprised to see that Theano did not scale well for large problems (hence the autograd implementation).

Best,

Rémi

AjayTalati commented 7 years ago

Dear Remi,

thank you very much for the kind encouragement, I appreciate it a lot :) I think I understand the SK algorithm in its different variants more now.

I started reading your recent paper Wasserstein Discriminant Analysis - I believe it's possible to calculate Jacobians in PyTorch, (although the general method for calculating gradients of gradients will not be available for a few weeks)

https://discuss.pytorch.org/t/clarification-using-backward-on-non-scalars/1059/2

although I don't really understand how well this would scale to large data vectors, so unsure if it would improve your current method? I guess such methods could be used in neural networks to approximate density functional theory - but I'm a beginner so not sure how to get started?

Kind regards,

Ajay

PythonOT / POT

A pure numpy implementation of the network simplex algorithm, `ot.emd(a,b,M)` #5