Closed Vilin97 closed 2 months ago
Hey, @Vilin97.
You don't need to use histograms per se. You can sample the distributions and compute either the sinkhorn distance or the exact distance.
There has been some time since I last used the package. I'll recover some notebooks I have, and perhaps I can do a quick example for your case. It should be straightforward.
Thank you for the answer, @davibarreira . Given X, Y, both d x n matrices (d is the dimension and n is the size of the sample), how can I compute the W2 distance between the empirical distributions given by X and Y? I did not understand how to do it from the documentation of sinkhorn
.
Random.seed!(3)
σ1 = MvNormal(I(2))
N = 100
μ = fill(1 / N, N)
μsupport = rand(σ1,100)'
M = 50
σ2 = MvNormal([5,5],I(2))
ν = fill(1 / M, M)
νsupport = rand(σ2,M)';
C = pairwise(sqeuclidean, μsupport', νsupport'; dims=2);
# This is the exact total cost
γ = emd2(μ, ν, C, Tulip.Optimizer());
ε = .5
# This is the sinkhorn cost
s = sinkhorn2(μ, ν, C, ε);
@Vilin97 , does the code above answer your questions? I'm sampling two multivariate normal distributions, and then constructing the dirac dist. Then, I compute the cost matrix C
using the squared euclidean distance. I'm using Distances.jl
for the sqeuclidean
function, and Tulip.jl
for the Tulip.Optimizer()
.
Thank you so much for this snippet! I will play around with it when I get to my laptop but from the first glance it looks like exactly what I wanted. Thank you!
The code you gave works. Thank you!
I have two distributions in d-dimensional space, between which I want to compute Wasserstein distance. One distribution is a sum of Dirac delta functions (i.e. an empirical distribution), and the other is given by a density (e.g. a Gaussian). Is my best option to compute histograms of both and compute the distance between the histograms? I don't like this approach because the result will depend on the bin width, and bin width choice is a hard problem. Is there a better way?
Here is what I have so far:
Questions:
size(C) == (size(μ, 1), size(ν, 1))
inchecksize
. I don't quite understand whatC
should be whenμ
andν
are not vector-valued.