JuliaOptimalTransport / OptimalTransport.jl

Optimal transport algorithms for Julia
https://juliaoptimaltransport.github.io/OptimalTransport.jl/dev
MIT License
93 stars 8 forks source link

Optimal transport between dataset and discrete bivariate distribution. #175

Closed lrnv closed 1 year ago

lrnv commented 1 year ago

Hey,

I have a bivariate dataset and a bivariate distribution defined as :

using Distributions
n=100
data = randn((n,2))
MarginDist(data,i) = DiscreteNonParametric(data[:,i],ones(size(data,1))/size(data,1))
D = product_distribution(MarginDist(data,2),MarginDist(data,1))

How should I go for computing (or at least approximating) the Wasserstein distance (cost = square Euclidean norm) between the dataset data and the distribution D ? Note that the marginals are exchanged (so that, when the distance is minimized, they match each other) and that the dependence structure of D is the independence, all this is on purpose.

zsteve commented 1 year ago

Apologies for the slow response! I need to set up email notifications. Is this related to the entropic OT independence criterion? https://proceedings.mlr.press/v151/liu22h/liu22h.pdf Probably the most straightforward way to do this would be to just sample from D and then use e.g. emd or sinkhorn_divergence. Alternatively you could construct the product of marginals exactly, although this would be quadratic in the number of points (e.g. 100^2 points on the product).

lrnv commented 1 year ago

Hey,

No it is not related but thanks for the ref. I am indeed trying to enforce independence, but also the fact that the two marginals are the same at the same time.

As you noted, the number of atoms of the second distribution is quadratic. I will try by sampling a smaller amount of them then as you point out.

Thanks, this can of course be closed as this is not an issue on OptimalTransport.jl anymore. Great package btw :)