JuliaApproximation / DomainSets.jl

A Julia package for describing domains as continuous sets of elements
MIT License
72 stars 12 forks source link

Can StaticArrays dependency be made optional? #91

Open cscherrer opened 2 years ago

cscherrer commented 2 years ago

From https://github.com/cscherrer/MeasureBase.jl/issues/16:

DomainSets currently depends on StaticArrays, which is a heavy dependency for a core library like this. Would it be possible to refactor DomainSets somewhat to use Static.jl instead, possibly with an optional StaticArrays dependency? Or maybe there's a better approach?

daanhb commented 2 years ago

Hmm, I did not think of StaticArrays as a heavy dependency, but perhaps it is. For the time being, the main reason for DomainSets being larger than it should be is that a significant part of the code has to do with maps (function mappings). We plan to separate that out, though it will remain a dependency.

I'm afraid quite a bit of code is specific to StaticArrays, in particular when dealing with product domains. E.g. you typically want the product of a rectangle with an interval to behave like a cube with simple vector elements, rather than some nested vector. We support both, but static arrays are given preferential treatment in many places.

That said, I don't know Static.jl, but it might work.

cscherrer commented 2 years ago

In most contexts it's really not. But in development of MeasureTheory, there was consistent feedback that we needed a lightweight "core" library with fewer dependencies; this led me to factor out MeasureBase.jl. To me, having StaticArrays as a dependency is not such a big deal. But I've been miscalibrated on this before, so I want to be cautious about it.

Static.jl is really nice! You don't need separate types for static vs dynamic size information, for example you can have

struct Foo{T}
    dimension::T
end

Then an instance might be ::Foo{Int} in the dynamic case or ::Foo{StaticInt{3}} in the static case. And it's especially nice that StaticInt{N} <: Integer, so most calculations "just work".

Also worth checking out is ArrayInterface which uses Static: https://juliaarrays.github.io/ArrayInterface.jl/dev/#Static-Traits

cscherrer commented 2 years ago

Hi @daanhb ,

Thinking some more about this, the bigger challenge might be that products seem to always be represented using SArrays. These are really fast for small arrays, but the speed comes from an internal Tuple representation, and anything using them is aggressively unrolled. For statistical applications, a lot of the work is in high dimensions. So StaticArrays are nice to have as an option (when there's some subarray you want to speed up), but using them everywhere can lead to explosions in compile time.

Do you think there could be a way to extend this so other array formats can be used? Or maybe this is too far outside the goals of the package?

daanhb commented 2 years ago

That is definitely within scope. That is in fact one of the reasons there are three subtypes of ProductDomain: they differ in the way the composing domains are stored and in how the elements of the domain are represented. The abstract constructor ProductDomain tries to return the most appropriate subtype, based on the types of the arguments. In principle there is nothing to enforce that the domains are represented by a tuple or a static array, and if there is that would be a bug. The construction of product domains is a bit mysterious though, so it may not be straightforward to get exactly the representation you want.

Some examples:

julia> d1 = ProductDomain(map(Interval, rand(10), rand(10)))
D₁ × D₃ × D₂ × ... × D₄

D₁ = 0.04102775113924917..0.7543070839059667
D₂ = 0.0301860782231258..0.3334968805199827
D₃ = 0.1203453376504664..0.31294581525887777
D₄ = 0.8943245774798796..0.021646286220368793

julia> typeof(d1)
Rectangle{Vector{Float64}}

julia> eltype(d1)
Vector{Float64} (alias for Array{Float64, 1})

The ProductDomain is given a vector of Intervals and decides to make it a rectangle, whose elements have type Vector.

julia> d2 = ProductDomain(0..1, UnitCircle())
(0.0..1.0) × UnitCircle()

julia> typeof(d2)
VcatDomain{3, Float64, (1, 2), Tuple{ClosedInterval{Float64}, EuclideanUnitSphere{2, Float64}}}

julia> eltype(d2)
StaticArrays.SVector{3, Float64} (alias for StaticArrays.SArray{Tuple{3}, Float64, 1, 3})

julia> d2.domains
(0.0..1.0, UnitCircle())

Here, the constructor is given two domains. The best guess is that the user wants to concatenate the 1D domain and the 2D domain, hence the result is a VcatDomain with elements of type SVector{3}. If concatenation is not desired, you can explicitly invoke the TupleProductDomain constructor, in which case the elements will be tuples:

julia> TupleProductDomain(0..1, UnitCircle())
(0..1) × UnitCircle()

julia> eltype(ans)
Tuple{Int64, StaticArrays.SVector{2, Float64}}

Now, on to vectors of domains:

julia> ProductDomain(map(x -> x*UnitCircle(), 1:5))
Sphere(1.0, [0.0, 0.0]) × Sphere(2.0, [0.0, 0.0]) × Sphere(3.0, [0.0, 0.0]) × Sphere(4.0, [0.0, 0.0]) × Sphere(5.0, [0.0, 0.0])

julia> eltype(ans)
Vector{StaticArrays.SVector{2, Float64}} (alias for Array{StaticArrays.SArray{Tuple{2}, Float64, 1, 2}, 1})

This domain is the concatenation of circles with increasing radius. The element type is a Vector of SVector{2}s, because SVector{2} is the element type of a UnitCircle. You can also make spheres with Vector elements using UnitSphere(2). Compare:

julia> eltype(UnitSphere(Val(2)))
StaticArrays.SVector{2, Float64} (alias for StaticArrays.SArray{Tuple{2}, Float64, 1, 2})

julia> eltype(UnitSphere(2))
Vector{Float64} (alias for Array{Float64, 1})

julia> ProductDomain(map(x -> x*UnitSphere(2), 1:5))
Sphere(1.0, [0.0, 0.0]) × Sphere(2.0, [0.0, 0.0]) × Sphere(3.0, [0.0, 0.0]) × Sphere(4.0, [0.0, 0.0]) × Sphere(5.0, [0.0, 0.0])

julia> eltype(ans)
Vector{Vector{Float64}} (alias for Array{Array{Float64, 1}, 1})

The representation of the factors of a product domain is mostly based on how ProductDomain is invoked. When given a vector of domains, it will default to a VectorProductDomain.

There are some trade-offs. The reason tuples are the default when possible is that they are more efficient for mixed types. A VectorProductDomain holds a Vector of domains, which must be of the same type or the eltype of that vector will be some abstract type - working with that type might be slow. Also, there is a restriction currently that VectorProductDomains contain a vector of domains with the same dimension. Concatenating domains of different dimensions into a Vector, like VcatDomain does with static arrays, would be difficult to implement efficiently.

daanhb commented 1 year ago

Well it looks like we can depend on StaticArraysCore.jl instead, without any other changes. It improves load time by about 0.5s on my machine (a reduction of more than 50%).

Anyone working with higher-dimensional domains, and who hasn't separately loaded StaticArrays, will get an uninformative error message though (like size is not defined for AbstractVector).