Closed davibarreira closed 1 year ago
This looks like a nice addition. Is there a reason we use discretemeasure
instead of DiscreteNonParametric
directly? Perhaps in order to support the use-case as you have, where distances are batched across rows? (sorry I think I missed some of the previous discussion about this).
Otherwise this makes sense, and looks like it should be straightforward to implement as wrappers for the existing sinkhorn
/etc. functions!
I imagine we will be aiming to keep the separation between exact/regularised OT between OptimalTransport.jl
and ExactOptimalTransport.jl
. Would you then export discretemeasure
from both packages? Perhaps it would make more sense then to export discretemeasure
from OptimalTransport and then make it a dependency of ExactOptimalTransport.
The DiscreteNonParametric
type is from Distributions.jl
and it's only for 1D distributions. So the discretemeasure
function dispatches to either DiscreteNonParametric
when the vector is 1D, and otherwise it dispatches to a type called FiniteDiscreteMeasure which we have created. This FiniteDiscreteMeasure is a placeholder solution. The idea would be to have something more robust implemented elsewhere.
About where the discretemeasure should be, I'm guessing it should be on OptimalTransport.jl, but that would make ExactOptimalTransport dependent on it, no?
I'm trying to create a package with a nice implementation of FiniteDiscreteMeasure, and I would ship a discretemeasure
function. So our packages could only have that as a dependency.
About where the discretemeasure should be, I'm guessing it should be on OptimalTransport.jl, but that would make ExactOptimalTransport dependent on it, no?
Ideally we would generalize DiscreteNonParametric
in Distributions to arbitrary variates (i.e., not only uni- and multivariate), possibly with a different name to avoid breaking changes.
@devmotion What do you mean arbitrary variates? You mean like Categorical?
I'm starting to code FiniteDiscreteMeasure.jl with the goal of extending DiscreteNonParametric. Besides CategoricalDistributions.jl, do you know of any initiative in this direction? I'm trying to code the package using CategoricalDistributions as a model, but I'm still trying to understand.
@devmotion , I've created a package extending the DiscreteNonParametric distribution. The package is fully compatible with Distributions.jl. It's here EmpiricalMeasures.jl. I'm already sending it to be registered.
@zsteve
@davibarreira Sorry, I was busy with other things last week and missed the messages here. My main point is that
DiscreteNonParametric
So it's a bit unfortunate that this package does not address these points and is about to be registered.
Maybe you can make a PR to Distributions and then I can explain how it could be generalized?
I think I still can stop the PR from registering the package.
I totally agree with you about making it more general, but I decided to go ahead with a multivariate version first, cause it's already quite useful and I wanted to get a hang of how I would do it. I was not able to figure out how to properly do a more general version without much more work. Also, I was worried a PR would just stand there without ever getting merged, cause I've opened the issue there and talked about it on Zulip and Slack, and no one from Distributions.jl seemed to be very interested. :/
But I can always just cut part of my code and submit as a PR.
Although, if I understand Distributions.jl correctly. They have a hardline between Univariate and Multivariate. So my code just followed this idea. I just created a MvDiscreteNonParametric
distribution.
@devmotion , let me know if you think I should pull the plug for the package. I intend to make it compatible also with MeasureTheory.jl, so there might be a reason for it even if it get's incorporated to Distributions.jl.
Although, if I understand Distributions.jl correctly. They have a hardline between Univariate and Multivariate. So my code just followed this idea. I just created a MvDiscreteNonParametric distribution.
These two variate forms (as they are called in Distributions) are closely related: Distributions uses
const Univariate = ArrayLikeVariate{0}
const Multivariate = ArrayLikeVariate{1}
const Matrixvariate = ArrayLikeVariate{2}
This is exploited e.g. in https://github.com/JuliaStats/Distributions.jl/pull/1391. I think, however, it would be possible to allow even just generic VariateForm
s in the definition of the Distribution (i.e., make it a subtype of Distribution{VF,Discrete}
) where VF<:VariateForm
). One just has to make VF
a type parameter of the distribution to be able to use this supertype.
I intend to make it compatible also with MeasureTheory.jl, so there might be a reason for it even if it get's incorporated to Distributions.jl.
There are ongoing discussion for extracting a common lightweight interface for measures (in the MeasureTheory sense) and to support it in Distributions. Ideally, this would add support for MeasureTheory more or less automatically for distributions in Distributions (currently MeasureTheory already supports Distributions but using some heuristics; it would be better if Distributions adds the required definitions in the package directly).
I think it's too early to say if/what advantages would be of a separate package. But I think that currently a PR to Distributions would be preferable.
Ok. I've stopped the registration. I'll submit the code as a PR to Distributions.
Maybe you can make a PR to Distributions and then I can explain how it could be generalized?
I've submitted the PR.
We've exported the
discretemeasure
function that creates aDiscreteNonParametric
distribution from Distributions.jl orFiniteDiscreteMeasure
which we've implemented. As of now, if we do something likeIt'll work, but only for the 1D case. I'd like to extend this to something like:
Here, the cost matrix would be internally constructed.
It would be nice to have similar functionality for the other functions in the package, where the user could pass a cost function and the measures:
What do you guys think?