Please add sliced algorithms

PythonOT / POT

POT : Python Optimal Transport

https://PythonOT.github.io/

MIT License

2.43k stars 500 forks source link

Please add sliced algorithms #185

Closed bionicles closed 2 years ago

bionicles commented 4 years ago

To handle larger problems faster with less memory usage, would it be possible to add ot.sliced module?

Papers: sliced wasserstein, sliced gromov-wasserstein, cramer-wold distance does an integral approach which might be nice for performance, anchor energy and anchor wasserstein also seem promising

A sliced version of FGW would be useful right away for molecular biology as we could measure distances between large molecules ( think antibodies, vaccines, virus proteins, etc etc ) which can have upwards of hundred thousand points with many dimensions of features (mass,charge etc) ... likewise with point clouds for self driving cars and other assorted geometric applications, this could enable POT to be useful in production

this recent paper Statistical and Topological Properties of Sliced Probability Divergences is optimistic about the quality of results from such metrics

rflamary commented 4 years ago

I agree it would be interesting to provide slices Wasserstein distance and gromov in POT but I dont' see it happening will not be done before the NeurIPS deadline.

In the meantime we have a solution for wasserstein 1D here https://pythonot.github.io/gen_modules/ot.lp.html#ot.lp.emd2_1d In order to do sliced Wasserstein it requires a projection with random directions and a loop which is a few lines of code. If you have the same number of samples it's just a sorting and it's even faster.

We also have a PR not yet merged but with the implementation for gromov-1D here https://github.com/PythonOT/POT/pull/129

Note that you are welcome to provide a PR if you implement it before us.

LoryPack commented 4 years ago

Hi, I noticed that a sliced_wasserstein implementation has been added to the code and also appears already in the documentation, but it is not yet available if installing via pip. Any idea about when that will be available?

Thanks so much

rflamary commented 4 years ago

On the next release ;).

Feel free to use the master version of POT that you can install wiyth

pip install -U https://github.com/PythonOT/POT/archive/master.zip

LoryPack commented 4 years ago

Thanks for that. Looking forward to the next release then

rflamary commented 2 years ago

Sliced wasserstein has been added to release 0.8, closing this issue

LoryPack commented 2 years ago

Hello, thanks a lot for this!

I was checking the results from the old sliced wasserstein distance at this point in time and the results of the new one which you just released. I realized that calling the old and new get_random_projections function gives different results (fixing numpy random number generator, and accounting for the transposed matrix). The sliced_wasserstein_distance function instead gives the same result, once the projections are fixed. Just wanted to understand if that behavior is due to handling the randomness in different way (so that the reproducibility has been lost between commits) or if there is any substantial difference. Thanks a lot!

rflamary commented 2 years ago

yes it is due to the fact that we had to find a way to have reproducible functions across backends (we now have a number of backends). The way the matrix are stored can also change the result but all the rest is exactly the same algorithm.

LoryPack commented 2 years ago

That is great, thanks a lot for clarifying!