Closed zsunberg closed 2 years ago
I still have the same thoughts in this comment (https://github.com/JuliaMath/IntervalSets.jl/issues/115#issuecomment-1219278427), and I think the proposal is undesirable.
This is similar to the way
1:0.01:2
expands to1.0:0.01:2.0
.
1:0.01:2
includes a floating point number, but 1..2
does not. So the analogy does not seem to apply here.
The docstring for
Domain{T}
says that it is a "subset of type T"
I think the docstring is misleading and should be fixed. The type parameter T
just represents internal precision, but not for a mathematical set. (https://github.com/JuliaMath/IntervalSets.jl/issues/115#issuecomment-1219874621)
(actually any
a::Integer..b::Integer
will be converted toInterval(float(a), float(b)))
.
How about half(3) .. half(5)
(from HalfIntegers.jl) and 1u"m" .. 2u"m"
(from Unitful.jl)? These types are not Integer
but should be converted with float
for consistency. It is still hard to detect whether the given type is (mathematically) dense or not. (https://github.com/JuliaMath/IntervalSets.jl/issues/115#issuecomment-1219689209)
Calling
rand
on a set should return an element of that set, hencerand(Interval(1, 2))
should return anInt
.
Does this imply Interval(1,2)
is a set of integers with 1 ≤ x ≤ 2? (i.e. Interval(1,2)
is equal to 1:2
?) If so, should Interval(1,2) == Interval(1.0, 2.0)
be true
? How about rand(Interval(1u"m",2u"m"))
? Should 1.5 in Interval(1,2)
be false
? These breaking changes make a lot of confusing, I guess.
I think what we have to do are:
boundstype(1..2)
eltype(1..2)
Domain{T}
The behavior of eltype
sometimes seems incorrect.
julia> 1.0 in [1,2,3]
true
julia> 1.0 isa eltype([1,2,3])
false
So, I think the correct meaning of eltype
is "The element of x
can be converted to eltype(x)
without loss".
i.e. 1.0 in [1,2,3]
does not imply 1.0
is a Int
, but 1.0
can be converted to Int
loss-less.
Here's another example:
julia> 1 in Float32[1,2,3]
true
julia> 1.0 in Float32[1,2,3]
true
julia> nextfloat(1.0) in Float32[1,2,3]
false
julia> Float32(nextfloat(1.0)) in Float32[1,2,3]
true
In this concept, eltype(1.0 .. 2.0) == Float64
is still inappropriate even if real numbers are typically represented by Float64
in Julia.
julia> a = nextfloat(BigFloat(1.2)) # cannot converted to Float64 loss-less
1.199999999999999955591079014993738383054733276367187500000000000000000000000017
julia> Float64(a) == a
false
julia> a in 1.0 .. 2.0
true
I think the most correct definition of eltype(1.0 .. 2.0)
is Real
, but it is less informative and less useful, and it will be hard to implement such eltype
for non-real intervals. (Should we define all eltype
for each type such as Date(2022,08,02) .. Date(2022,09,01)
? It seems okay to keep eltype(::Interval)
not defined.)
rand(Interval(1,2)) should return an Int
I disagree with this completely. Interval(1,2)
still contains all real numbers between 1 and 2.
I'm still of the opinion that integer endpoints should remain integers. In particular, we are using eltype
to measure the precision of the definition of the set. Leaving it an Int
means that the endpoints are exact (mod 2^64). Converting it to a Float64
means we are only saying it's accurate up to eps()/2
.
I think the real issue is the use of eltype
. I do think eltype
should be renamed however and eltype
should not be implemented for intervals.
rand(Interval(1,2)) should return an Int
I disagree with this completely.
Interval(1,2)
still contains all real numbers between 1 and 2.
Which type should rand(Interval(1,2))
return then, and how do I find out about this before calling the function?
I think if rand
is implemented it would be sensible to return a float(eltype(Interval(1,2)))
Okay. That sort of implies that we settle on a prectype
of an Interval{T}
that is equivalent to float(T)
. A function like that might as well be explicitly defined.
I don't mind the outcome of this discussion, but by now I do feel strongly about having the eltype
of Domain{T}
be T
.
I would actually say the opposite: prectype(::Domain{T}) = T
and eltype(::Domain{T}) = float(T)
. This would make the rand
example make more sense.
If there is a function that returns the T
of Domain{T}
, and I think there should be, then these are the candidates so far:
boundstype
: works fine for intervals, but does not generalize to other domains (unbounded domains, for example) and even for intervals we still seem to want some kind of precision associated with them and T
is all we have to base it onnumtype
, prectype
: works fine for intervals, but does not generalize to T <: AbstractVector
eltype
: works fine for most cases. However, domains are not defined as a collection of elements but rather by some mathematical condition which can be satisfied by multiple types with different sets of associated possible values. It leads to a particularly weird user experience for intervals with integer endpoints.I think the pull request aimed to improve the user experience, without changing the fundamental issues.
Of course some of the problems would go away if this package did not define Domain{T}
but, apart from other problems, from my point of view that just moves the issue to DomainSets.jl
instead of addressing it.
I think we are free to define ourselves what eltype
means for continuous sets, regardless of what it means for discrete sets, as long as (i) there is some conceptual notion that persists and (ii) it can be distinguished cleanly from other meanings based on type signatures only.
For this package, I'd say eltype
is the type of points that you can expect in a discrete set that mimicks the continuous one. From that perspective, if one absolutely wants intervals with integer endpoints, even when the explicit intention is to represent the continuous points in between, you'd need a type with two type parameters. (I think the underlying reason we are having this argument is that everybody just wants to avoid that.)
Also, to add to @hyrodium's convincing list of examples above, the eltype of an interval is not Real
either:
julia> 2.0+0im ∈ 1..2
true
the
eltype
of an interval is notReal
either:
Yes exactly. What I mean is that e in eltype(I)
implies convert(eltype(I), e) == e
(Or eltype(I)(e) == e
).
julia> Real(1+0im) == 1+0im
true
Yes exactly. What I mean is that
e in eltype(I)
impliesconvert(eltype(I), e) == e
(Oreltype(I)(e) == e
).
Yes, and I agree that by that definition it is impossible to correctly define eltype
for intervals. You clearly showed that. This is even unrelated to having integer endpoints or not, it remains true if you mix Float64
and BigFloat
. It is just a fundamental issue with continuous sets. I guess that my point is: rather than doing away with eltype
for that reason, let's keep it and change its definition. The docstring in Base is clearly specific to iterable sets, and continuous domains just aren't iterable. (If you wanted to "mimick" iteration over points in the set, then T
would, should or could be the eltype
of that discrete approximation. That's my current thinking. In fact I'm now convinced that intervals with integer eltype
should be disallowed, even in JuliaMath.)
I somewhat like the suggestion above to define eltype(::Domain{T}) = float(T)
, but it feels weird not to use T
itself for that purpose, because then T
loses its generic meaning. The T
does not stand for "the data type we use to describe a domain". That is an implementation detail of a concrete domain, not a generic notion. If that's the purpose of T
, we might as well define Domain
rather than Domain{T}
. That becomes very hard very quickly to work with.
It might work for intervals and I'm fine with that - we'll just adapt in the DomainSets package.
Personally I think IntervalSets.jl and DomainSets.jl should be rewritten so that a domain is an interface not an abstract type. That is, a domain is anything that implements in
.
Where is eltype
even used in IntervalSets.jl and DomainSets.jl? I'm not convinced its needed anywhere.
It's used in ApproxFun.jl but that can be easily changed by implementing prectype
there.
I checked last time the discussion came up, and IntervalSets doesn't use eltype
at all :-)
Personally I think IntervalSets.jl and DomainSets.jl should be rewritten so that a domain is an interface not an abstract type. That is, a domain is anything that implements
in
.
Currently, Julia does not lend itself very well to that idea in my experience. There is a hidden trade-off. If you can't say something is a domain based on its type, then any indication that you want to treat arguments to a function as a domain has to be in the function name. That is why DomainSets already has indomain
, uniondomain
, setdiffdomain
, etcetera, in addition to specializing in
, union
and setdiff
for Domain
arguments. Of course nobody uses those obscure functions. So, in trying to improve interoperability with other packages and types by not relying on "inheriting from Domain{T}", which is the noble goal, you also reduce interoperability again by using non-standard names for any non-trivial functionality.
I think this is why so many package ecosystems have a few core abstract types, including famous ones like AbstractArray{T}
. Note the T
:-)
Actually, it seems like the best solution might be
abstract type AbstractInterval{T, E} <: Domain{T} end
Then E
could represent the endpoint types and T
could be a type convenient for representing members. (I think @daanhb talked about this above, but I just wanted to make it explicit - it does not seem that bad to me)
Here are my responses to some of the more important questions above:
Where is eltype even used in IntervalSets.jl and DomainSets.jl? I'm not convinced its needed anywhere.
As I mentioned in the other thread, eltype
is extremely handy when the input to an algorithm is a set and the algorithm needs to use elements of that set. For example the action space in reinforcement learning is a set, and the agent needs to use objects that represent individual actions.
Personally I think IntervalSets.jl and DomainSets.jl should be rewritten so that a domain is an interface not an abstract type.
@dlfivefifty I feel the same way, though I am considering @daanhb's arguments so I have not formulated an opinion. @daanhb if there is more literature to read on this, I'd certainly be interested.
I think this is why so many package ecosystems have a few core abstract types, including famous ones like AbstractArray{T}. Note the T
Keep in mind that there is no Iterable{T}
, however.
Personally I think IntervalSets.jl and DomainSets.jl should be rewritten so that a domain is an interface not an abstract type.
@dlfivefifty I feel the same way, though I am considering @daanhb's arguments so I have not formulated an opinion. @daanhb if there is more literature to read on this, I'd certainly be interested.
The reasoning rapidly leads to formalizing interfaces. I'm not sure what the literature is on this topic. It seems likely that the trade-off I mentioned is a known thing though. Anyway, this goal suggests the exercise of formulating an interface for Domains, like the one for iterable objects, and see what kind of functionality can be made from that. But let's not do that here.
Actually, it seems like the best solution might be
abstract type AbstractInterval{T, E} <: Domain{T} end
I think this is the most sensible proposal so far.
We should come up with a well-defined definition of eltype
. The best I can come up with is "eltype(::Domain)
returns the type of rand(::Domain)
" but that's less than ideal....
Then
E
could represent the endpoint types andT
could be a type convenient for representing members.
How can we define the type parameter T
? If E <: Real
, then T
can be defined as T = float(E)
, but our support is not just real intervals. For example, it will be hard to define T
from "a" .. "bcd"
and Date(2022,09,05) .. Date(2022,09,08)
.
The best I can come up with is "
eltype(::Domain)
returns the type ofrand(::Domain)
" but that's less than ideal....
Yes, that's not ideal. The rand
function for an interval is defined only for Real
. (I'm not sure if we need eltype
for non-real intervals)
That's the wrong way round. for 1..2 == Interval(1,2)
we have E == Int
and T == float(Int) == Float64
. The catch is that would also require adding an extra type parameter to ClosedInterval
which is not ideal.
How can we define the type parameter
T
? IfE <: Real
, thenT
can be defined asT = float(E)
, but our support is not just real intervals. For example, it will be hard to defineT
from"a" .. "bcd"
andDate(2022,09,05) .. Date(2022,09,08)
.
Since in this scenario T
is the eltype, which depends on what the user wants to do, it should be possible for the user to choose it. It is what you expect the type of x
to be in typical usage of the inequality a <= x <= b
. A default is needed for the a..b
notation. Probably T=E
is a good default for most cases, including strings and dates. That is the current situation. For numerical intervals, T=float(E)
seems a good default.
I agree that adding type parameters is not ideal, especially if they are pervasive, i.e., all subtypes have them.
Note that DomainSets.jl seems to use the type parameter mostly for conversion. E.g. convert(Domain{ComplexF64}, 1..2)
.
Why this needs to be done needs to be thought through if we want to do a major redesign.
Warning: long post.
The thing that stings about this solution is that there are too many type parameters. This is a typical problem: generality leads to more parameters, but the common case is simple. In the current setting, in most cases T=E
would be just fine, but all intervals suffer the burden of having two type parameters because a few intervals actually need them.
There is a way to reconcile both, but it requires a bit more programming and, as always, just one more layer of abstraction. The design pattern goes as follows.
You make an abstract type with two type parameters. Here, that might be
abstract type AbstractInterval{T,E} end
All functionality of intervals is implemented for this type, in terms of T
(which is the eltype
), E
(which is the boundstype
), leftendpoint(d)
and rightendpoint(d)
. To start with, you'd write:
eltype(::Type{<:AbstractInterval{T,E}}) where {T,E} = T
boundstype(d::AbstractInterval) = boundstype(typeof(d))
boundstype(::Type{<:AbstractInterval{T,E}}) where {T,E} = E
The functionality for intervals follows, always for AbstractInterval
's.
Then, at the end, you do:
struct SimpleInterval{T} <: AbstractInterval{T,T}
a :: T
b :: T
end
leftendpoint(d::SimpleInterval) = d.a
rightendpoint(d::SimpleInterval) = d.b
struct ComplicatedInterval{T,E} <: AbstractInterval{T,E}
a :: E
b :: E
end
leftendpoint(d::ComplicatedInterval) = d.a
rightendpoint(d::ComplicatedInterval) = d.b
Both concrete intervals "just work" with all functionality of intervals. A simple interval is one where the eltype
and boundstype
are the same. Hence, it needs only one type parameter.
The ..
operator invokes one of the two concrete constructors, depending on the types of the points. So 2..3
might return a ComplicatedInterval{Float64,Int}
, but 2.0..3.0
just returns a SimpleInterval{Float64}
. This is type stable and most users will never see the complicated interval type.
Then, somebody else in some other package could do:
struct MySpecialIntegerInterval <: AbstractInterval{Int,Int}
a :: Int
end
leftendpoint(d::MySpecialInterval) = d.a
rightendpoint(d::MySpecialInterval) = 5
This is some interval with an integer left point and 5
as the right point. This interval would "just work" and, as long as the package is well-defined, integrate seamlessly with any other subtype of AbstractInterval{T,E}
.
A bonus is to define the abstract constructor AbstractInterval
like this:
AbstractInterval(a::Int, b::Int) = ComplicatedInterval{float(Int),Int}(a, b)
AbstractInterval(a::T, b::T) where {T <: AbstractFloat} = SimpleInterval{T}(a,b)
The contract is that this constructor always return one of its concrete subtypes. It just depends on the types of the arguments which one you'll get. But it will be optimized for your use case. As a user, you won't need to know what the name of the best fitting type is (and it would be bad practice to write code that depends on the specific concrete subtype). In this case I'd rather call the abstract type Interval
instead of AbstractInterval
. Want an interval with endpoints a
and b
? Just type Interval(a,b)
and you'll have the best one out there.
Finally, the a..b
syntax can just call that abstract constructor.
Both IntervalSets and DomainSets already use this design pattern to some extent. The aspect I want to emphasise here is that you can use it to simplify the types you'll most often be dealing with in the common cases, while still catering to the specialist out there, i.e., you're not giving up on generality.
Here is a working example of what I mean, over in DomainSets. An AffineMap(a,b)
returns a function that computes a*x+b
. It is an abstract constructor that specializes to the types you give it.
Scalars:
julia> using DomainSets
julia> m = AffineMap(2.0, 3)
x -> 2.0 * x + 3.0
julia> typeof(ans)
DomainSets.ScalarAffineMap{Float64}
Arrays and vectors:
julia> AffineMap(rand(2,2), rand(2))
x -> A * x + v
A = [0.6369548300001839 0.6384788626918244; 0.8296441404786224 0.16026239514708762]
v = [0.3984510604775833, 0.2192951933222833]
julia> typeof(ans)
DomainSets.VectorAffineMap{Float64}
Something of the form 2*x + v
in which v
is not a vector but a 2x1
matrix. This is a more general case in which I want to retain the information of the types:
julia> m = AffineMap(2, rand(2,1))
x -> 2.0 * x + A
A = [0.527326545615108; 0.03894785911249521;;]
julia> m([1,2])
2×1 Matrix{Float64}:
2.527326545615108
4.038947859112495
julia> typeof(m)
DomainSets.GenericAffineMap{Matrix{Float64}, Float64, Matrix{Float64}}
Ok, I think the general approach of having the bounds type be (possibly) different from the eltype seems to be the best candidate solution. I will think about the details and propose something in a new issue.
The issue of what does eltype
precisely mean is tricky though. All I know is that I really want it to be Float64
for 1..2
:joy:.
Based on the discussions in #115, I am going to submit a pull request with the following changes:
1..2
will expand toInterval(1.0, 2.0)
(actually anya::Integer..b::Integer
will be converted toInterval(float(a), float(b))
).rand(Interval(1, 2))
will return anInt
The justification for change 1 goes like this:
1..2
, they are imagining a set of real numbers between 1 and 2.1..2
isInterval(1.0, 2.0)
. This is similar to the way1:0.01:2
expands to1.0:0.01:2.0
.Interval(1, 2)
, they can still do it explicitly.The justification for change 2 goes like this:
Domain{T}
says that it is a "subset of typeT
"rand
on a set should return an element of that set, hencerand(Interval(1, 2))
should return anInt
.Maintainers, what do you think about this?