Open goretkin opened 3 years ago
Good catch. I guess the reason for the current behavior is to have first(keys(x)):last(keys(x)) == keys(x)
. But it's not great that first(keys(x)) < last(keys(x))
even when x
is empty.
A simple fix would be to use checkbounds(Bool, A, i)
instead. Would you make a PR to do that (with a test)?
More fundamentally I'm not sure why findnext
doesn't throw an error when an out of bounds index is passed. Maybe something to revisit for 2.0.
Would you make a PR to do that (with a test)?
Sure thing.
More fundamentally I'm not sure why
findnext
doesn't throw an error when an out of bounds index is passed.
I don't have a justification handy (@mbauman comes to mind, so maybe they do), but I suppose this choice is made to be consistent with first
on empty keys
, exactly so that findnext
can work in the empty case where there is no in-bounds index.
One guiding desire is
findfirst(x) == findnext(x, first(keys(x))
which is true right now by definition, on AbstractArray
, but not in spirit:
julia> invoke(findfirst, Tuple{Any}, fill(true, (0, 2)))
julia> invoke(findfirst, Tuple{Any}, fill(true, (2, 0)))
julia> findfirst(fill(true, (2, 0)))
julia> findfirst(fill(true, (0, 2)))
ERROR: BoundsError: attempt to access 0×2 Array{Bool,2} at index [1, 1]
The choice of what first
and last
does on these empty collections in peculiar to begin with. For example:
julia> first(keys(zeros(0, 2)))
CartesianIndex(1, 1)
julia> first(collect(keys(zeros(0, 2))))
ERROR: BoundsError: attempt to access 0×2 Array{CartesianIndex{2},2} at index [1]
which I bet is chosen to align with:
julia> first(LinearIndices(fill(0, (0, 2))))
1
julia> firstindex(fill(0, (0, 2)))
1
It's peculiar that first(empty_thing)
and last(empty_thing)
return anything at all, such that you can't distinguish between
julia> first(keys(fill(0, (1, 2))))
CartesianIndex(1, 1)
julia> first(keys(fill(0, (0, 2))))
CartesianIndex(1, 1)
Here's one explanation: https://github.com/JuliaLang/julia/issues/34697#issuecomment-624063133
@nalimilan I am not sure if checkbounds
is correct here.
Current behavior:
julia> findnext(fill(false, 2, 2), CartesianIndex(-1, -1))
ERROR: BoundsError: attempt to access 2×2 Matrix{Bool} at index [-1, -1]
If I understood your suggestion with checkbounds
, it would instead return nothing
.
Mind the Bool
in checkbounds(Bool, A, i)
.
Perhaps you understood that BoundsError
to be coming from the corrected code. But it's from the implementation on master
.
Let me know if this clarifies it
function _findnext(A, start)
l = last(keys(A))
i = oftype(l, start)
# i > l && return nothing # original line
checkbounds(Bool, A, i) || return nothing
while true
A[i] && return i
i == l && break
# nextind(A, l) can throw/overflow
i = nextind(A, i)
end
return nothing
end
_findfirst(A) = _findnext(A, first(keys(A)))
#
findfirst(fill(false, 0, 2)) # broken
findfirst(fill(false, 2, 0)) # good
_findfirst(fill(false, 0, 2)) # the change fixed this, good 👍
_findfirst(fill(false, 2, 0)) # this was working before, and still is.
findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # throws `BoundsError`, good
_findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # does not. newly broken. 👎
Instead of:
i > l && return nothing
what about simply doing:
(i > l || isempty(A)) && return nothing
(Edit: oh, I see, this will similarly not throw BoundsErrors like you want)
I think it's just fine — and a feature — that firstindex
might return an index out of bounds. Same goes for lastindex
.
I agree it feels unsatisfactory that first(keys(x1)) > last(keys(x1))
but first(keys(x2)) < last(keys(x2))
when they're both empty arrays... but I'm afraid there's not much we can do there.
Thinking about this more, I don't think we want to throw bounds errors from findnext
. Again, go with the linear indexing version — there i > lastindex(A)
is a 100% valid bounds check. The reason for the BoundsError
is exactly the same cause as the initial bug.
I wrote
findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # throws `BoundsError`, good
_findnext(fill(false, 2, 2), CartesianIndex(0, 0)) # does not. newly broken. 👎
But I actually wasn't sure whether it should or should not throw BoundsError
. I was just observing that the existing code checks the bounds only in one direction (i > l
), and that checkbounds
checks both directions (ub > i > lb
).
Would it be non-breaking to make findnext(fill(false, 2, 2), CartesianIndex(0, 0))
return nothing
?
Probably others already know this, so consider it a note-to-self.
I had suspected that there might be something wrong with the ordering on CartesianIndex
. Not that there would be room to change it in 1.0, but I wanted to know if there was any inconsistency. To be sure, I did the following exercise:
import OffsetArrays # for `fill` to work on `UnitRange`
"""
Clumsy implementation based on definition in https://github.com/JuliaLang/julia/pull/16054 :
`isless(I1, I2)` returns true if `I1` would be visited before `I2` during `CartesianRange` iteration
"""
function isless2(I1::CartesianIndex, I2::CartesianIndex)
ranges = map(x->UnitRange(x...), minmax.(Tuple(I1), Tuple(I2)))
A = fill(nothing, ranges)
for i = CartesianIndices(A)
i === I1 && return true
i === I2 && return false
end
error()
end
@show isless(CartesianIndex(1, 1), CartesianIndex(0, 2))
@show isless(CartesianIndex(1, 1), CartesianIndex(2, 0))
@show isless2(CartesianIndex(1, 1), CartesianIndex(0, 2))
@show isless2(CartesianIndex(1, 1), CartesianIndex(2, 0))
So indeed the implementation of isless(::CartesianIndex, ::CartesianIndex)
is true to its meaning.
Again, go with the linear indexing version — there
i > lastindex(A)
is a 100% valid bounds check
Could you elaborate on this? I think I missed some context.
My point is simply that for integer indices, there aren't values between firstindex
and lastindex
that are out of bounds in the way that there are for CartesianIndex
.
Thanks for clarifying. I am not sure I've understood what (or whether) we've converged on in terms of desired behavior. Note that there seems to be an issue with the bounds checking that is unrelated to degenerate array dimensions.
I wrote some tests here: https://github.com/JuliaLang/julia/pull/36967/files#diff-f5dc554ce02d9b8c6323d7a7e8077ccaR594
@nalimilan @mbauman do those tests encode the behavior we want?
Thinking about this more, I don't think we want to throw bounds errors from findnext
The tests show situations where findnext
and co do currently throw BoundsError
. They seem desireable to me, and also perhaps necessary to ensure compatibility within 1.0
OK so we currently have an asymmetry in the integer bounds check (we throw an error when it's too low, but return nothing
when it's too high). I guess this behavior makes sense if you want to loop over the result until you get nothing
, without having to check bounds manually again.
So regarding your tests, I'm not sure we really want findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing
, as CartesianIndex(3, 1)
isn't an index you can get by calling nextind(fill(true, (2, 2)), i)
if you start with a value of i
which is valid. I'd only keep tests that ensure that we return nothing
on first(keys(x))
and on nextindex(last(keys(x)), x)
.
Regarding the implementation, maybe we should just do (i > l || (isempty(A) && i >= first(A)) && return nothing
?
I'm not sure we really want
findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing
You mentioned the too-low-vs-too-high asymmetry which is showing up because the fix for the issue should either intentionally preserve the current behavior, or intentionally improve it.
Another asymmetry that currently exists, that I'd like to get rid of, is that these do different things:
@test findnext(fill(true, (2, 2)), CartesianIndex(1, 3)) === nothing
@test findnext(fill(true, (2, 2)), CartesianIndex(3, 1)) === nothing # 👎
@test findnext(fill(true, (2, 2)), CartesianIndex(3, 3)) === nothing
Foremost, I believe they should either all throw or all evaluate to nothing
. Is that something we can agree on?
If I understood correctly, CartesianIndex(1,3)
isn't an index you would get from nextind(fill(true, (2, 2)), i)
with a valid i
, but I can't tell whether you feel it should keep returning nothing
or if it should also throw BoundsError
.
If I understood correctly,
CartesianIndex(1,3)
isn't an index you would get fromnextind(fill(true, (2, 2)), i)
with a validi
, but I can't tell whether you feel it should keep returningnothing
or if it should also throwBoundsError
.
nextind(fill(true, (2, 2)), last(keys(x))) == CartesianIndex(3,3)
so you're going to pass that index if you do this:
x = fill(true, (2, 2))
i = first(keys(x))
while (v = findnext(x, i)) !== nothing
i = nextind(x, i)
...
end
So returning nothing
is somewhat useful in that case. And anyway we can't start throwing an error before 2.0 as it would break code.
OTC, CartesianIndex(3, 1)
and ~CartesianIndex(3, 1)
~CartesianIndex(3, 3)
cannot be obtained by calling nextind(x, i)
so there's no reason to stop throwing an error there.
You probably meant
nextind(fill(true, (2, 2)), last(keys(x))) == CartesianIndex(1, 3)
and
OTC, CartesianIndex(3, 1) and CartesianIndex(3, 3)
That helps me understand better what's going on. Thanks for explaining.
This behavior strikes me as inconsistent:
I had expected both calls to
findnext
to returnnothing
.Honing in, the inconsistency arises because of
and so the check at https://github.com/JuliaLang/julia/blob/4c1e3c039775e56c72a4a638db0a8a20f4972dd3/base/array.jl#L1726 passes through.
I believe the intention in
findnext
is to compareCartesianIndex
s in a way that agrees with the corresponding comparison on theLinearIndex
into the array. For example:(I tried on Julia 1.4 and Julia 1.5)