Open LilithHafner opened 2 weeks ago
Even better, diff(A::AbstractRange)
could be implemented as r[begin:end-1] .- r[begin+1:end]
, which gives back the range:
julia> rangediff(r) = r[begin:end-1] .- r[begin+1:end]
rangediff (generic function with 4 methods)
julia> rangediff(0:.1:1)
StepRangeLen(-0.1, 0.0, 10)
julia> println(collect(ans))
[-0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1]
This is in keeping with how we do lots of range arithmetic — it's O(1) where possible, even if it means that the values are different than the collect
ed versions would be. For example, (0:.1:1) ./ 3
is 0:1/30:1/3
instead of the division of the values the range generates:
julia> (0:.1:1) ./ 3
0.0:0.03333333333333333:0.3333333333333333
julia> (collect(0:.1:1) ./ 3) .- collect((0:.1:1) ./ 3)
11-element Vector{Float64}:
0.0
0.0
0.0
-1.3877787807814457e-17
0.0
0.0
-2.7755575615628914e-17
-2.7755575615628914e-17
0.0
0.0
0.0
So what's the rule? I think it's gotta be a little gray:
Where does this leave searching? No clue.
I have, in the past, though I do not remember the exact discussions, been informed of the following two stylized facts:
is it possible for a clean design to satisfy both? it may have to sacrifice "equally spaced."
Even better,
diff(A::AbstractRange)
could be implemented asr[begin:end-1] .- r[begin+1:end]
, which gives back the range:julia> rangediff(r) = r[begin:end-1] .- r[begin+1:end] rangediff (generic function with 4 methods)
julia> rangediff(0:.1:1) StepRangeLen(-0.1, 0.0, 10)
julia> println(collect(ans)) [-0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1]
I think you mean rangediff(r) = r[begin+1:end] .- r[begin:end-1]
^_^;;
triage defers to a higher power.
Lol, took me a while to realize I'm supposed to be the higher power.
So there are more than two possible semantics:
first(r)
, last(r)
, step(r)
, length(r)
.getindex
approximationIn the case of searchsorted, we currently use a mix of 3 and 4. This means that we can find 0.3
in 0:0:.1:1
, but I can't find the theoretical 3//10
— instead, both 3//10 and 4//10 correctly land between 0.3
and 0.4
. That works, and it's "just" an optimization. But it assumes that the rounded range outputs (4) are monotonically increasing, which is why it falls apart with 1f8
:
julia> searchsorted(1f8:1f8+8, 100000000)
1:1
julia> searchsorted(1f8:1f8+8, 100000001)
2:1
julia> searchsorted(collect(1f8:1f8+8), 100000001)
6:5
I think the rule has got to be about what the particular function returns. search/find/etc return indices — indices that are specific to the given range. Thus those indices must be able to be used as indices in the range and get the values they found.
Conversely, simple range arithmetic returns other ranges. As long as the start/stop and length are correctly computed, I think it's ok for the collect
ed variants to give slightly different numeric answers. It's not terribly unlike different summation algorithms, really.
search/find/etc return indices — indices that are specific to the given range. Thus those indices must be able to be used as indices in the range and get the values they found.
Amusingly Horrifyingly, though, I don't see any feasible way to get around this disconnect:
julia> r = 1f8:1f8+12
1.0f8:1.0f0:1.0000002f8
julia> idxs = searchsorted(collect(r), 100000000) # Assuming searchsorted returns the range outputs that match
1:5
julia> r[idxs]
1.0f8:1.0f0:1.0f8
julia> all(==(100000000), r[idxs])
true
julia> r[idxs] .- 100000000
0.0f0:1.0f0:4.0f0
julia> collect(ans)
5-element Vector{Float32}:
0.0
1.0
2.0
3.0
4.0
Theoretical question: Is the range
0.0:0.1:1.0
A) a performance optimization that should behave identically to the vector[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
or B) does it represent the 11 real numbers evenly spaced between 0 and 1, including both endpoints?Practical implications: Under A,
diff(0:.1:1)
is[0.1, 0.1, 0.09999999999999998, 0.10000000000000003, 0.09999999999999998, 0.09999999999999998, 0.09999999999999998, 0.10000000000000009, 0.09999999999999998, 0.09999999999999998]
(current implementation) Under B, a performance and accuracy improvement would be to definediff(x::AbstractRange) = fill(step(x), length(x)-1)
which would return[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
.Under A,
searchsorted(1f8-10:1f8, 1f8)
should return5:9
Under B,searchsorted(1f8-10:1f8, 1f8)
should return9:9
(current implementation, see #34408)Under A,
collect
ing a range before performing an operation should never change results Under B, a range can be more precise than thecollect
ed vector so it is possible forcollect
to introduce rounding errors which change results.