`UnitSimplex` - Githubissues

cscherrer commented 2 years ago

Instead of checking that the sum is one, UnitSimplex seems to check that it's at most one:

julia> [0.3,0.6] ∈ UnitSimplex(2)
true

julia> [0.3,0.7] ∈ UnitSimplex(2)
true

julia> [0.3,0.8] ∈ UnitSimplex(2)
false

It also doesn't seem to check the length (though UnitSimplex(Val(2)) does check this):

julia> [0.3,0.4,0.2] ∈ UnitSimplex(2)
true

julia> [0.3,0.4,0.3] ∈ UnitSimplex(2)
true

julia> [0.3,0.4,0.4] ∈ UnitSimplex(2)
false

EDIT: I just realized the first of these is intentional. This is unusual, maybe a different use of "unit simplex" across fields? In statistics we use this for the support of Dirichlet distributions.

daanhb commented 2 years ago

Thanks for reporting. The length is definitely an issue. For the first examples, I was interpreting the simplex as a volume I guess.

cscherrer commented 2 years ago

How would you express the "stats simplex"?

daanhb commented 2 years ago

Depends - what is the "stats simplex" exactly? :-)

It seems you can ask for the boundary of a simplex, which in 2D currently is the full triangle, hence its boundary is a collection of edges:

julia> boundary(UnitSimplex(Val(2)))
D₃ ∪ D₁ ∪ D₂

D₁ = [-1.0, 1.0] .* (0.0..1.0 (Unit)) .+ [1.0, 0.0]
D₂ = [0.0, -1.0] .* (0.0..1.0 (Unit)) .+ [0.0, 1.0]
D₃ = [1.0, 0.0] .* (0.0..1.0 (Unit)) .+ [0.0, 0.0]

It is a bit clunky though.

cscherrer commented 2 years ago

The simplex I have in mind is https://en.wikipedia.org/wiki/Simplex

cscherrer commented 2 years ago

A few nice things about that definition:

It's more symmetric, in that all edges have the same length. If you include the origin, edges can either be length one or sqrt(2)
It's useful for representing probabilities
It's equivalent topologically, you just need to go up a dimension

Maybe there are fields where it's standard to include the origin? I just haven't seen this formulation before

daanhb commented 2 years ago

I didn't really give it much thought. I wanted at least one polytope in each dimension, so all others could be represented by mapping it. The simplest definition would be best. Also, I'd like a representation both of the volume and of the boundary, that also seems an issue here.

cscherrer commented 2 years ago

By "here" do you mean in the current implementation? I think this is very easy for the definition I'm used to

daanhb commented 2 years ago

Yes, I meant the question of what to implement here in DomainSets. The difference between == 1 and <= 1 confused me, because it seems to refer to the difference between a boundary and a volume (i.e. circle versus disk), but I see now that the standard simplex you have in mind is an n-dimensional object in an (n+1) dimensional space. It does have a simple definition. From the wikipedia reference, I guess I implemented what they refer to as "corner of cube" or a "standard orthogonal simplex", which involves the origin.

Clearly the name UnitSimplex is not a good one if it leads to confusion, and changing the name or the domain is a breaking change, hence I labelled this 0.6.

In practice I'm interested in having a reference triangle that maps to any other triangle. I'm not too keen on making the reference triangle live in a higher dimensional space, as that would introduce a host of numerical issues (the map would be rectangular, and verifying membership becomes sensitive to rounding errors).

cscherrer commented 2 years ago

Yes, rounding errors are unavoidable if you try to do in directly on embeddings. Have you ever done any work with the Stan language? It focuses on Hamiltonian Monte Carlo sampling, which can only sample over ℝⁿ. But we usually want a lot more than than, so they have all kinds of transforms. The approach has become pretty standard.

So we don't usually ask whether a parameter is in whatever manifold we're interested in, but instead have a Markov chain walk around in ℝⁿ and map it into the space we want, so it's there by construction. You do sometimes need to go the other way; in that case, I usually project to the space, maybe with some tolerance if that's an issue.

daanhb commented 2 years ago

Okay. I'm not familiar with Stan, but the approach makes sense. In DomainSets we have in and approx_in with a tolerance, and I do try to avoid mixing them.

As for changes in DomainSets I'd like to keep the current implementation for now because it is practical, but I do wonder about its name. Even if there is just a difference in fields, it's best to avoid confusion. Perhaps it could be OrthogonalSimplex?

@dlfivefifty what is a "unit simplex" to you? See https://en.wikipedia.org/wiki/Simplex#The_standard_simplex

daanhb commented 2 years ago

(the issue was closed by my recent commit, but that only fixes the length-bug, not the naming)

JuliaApproximation / DomainSets.jl

`UnitSimplex` #102