Open ghyatzo opened 1 month ago
I just note that
_safe_nth(itr, n) = begin
y = iterate(Base.Iterators.drop(itr, n-1))
ifelse(isnothing(y), nothing, getindex(y, 1))
end
is as fast as your _inbounds_nth
.
julia> @btime _safe_nth(itr, 9999) setup=(itr=collect(1:10000))
161.977 ns (0 allocations: 0 bytes)
9999
Actually I'm a bit confused by the fact that the normal branching has a so high cost.
That is great, didn't know about ifelse
!
The performance disparity might be due to the fact that ifelse
is a normal function call, so it evaluates all arguments beforehand which might help with eliminating the branching altogether?
At this point there isn't really a reason to have a "safe" and "unsafe" version. might as well always check for nothing
and have the best of both worlds.
Actually I think the performance gain is just some kind of edge case optimization, consider this with your original version:
julia> itr = Iterators.filter(x -> x != 10, 1:10000);
julia> @btime _inbounds_nth($itr, 9999);
7.086 μs (0 allocations: 0 bytes)
julia> @btime _safe_nth($itr, 9999);
7.083 μs (0 allocations: 0 bytes)
In any case I think that returning only the element and not a new iterator starting from there is not ideal because usually one wants to go on with the iteration afterwards so I would consider something like:
julia> nth(itr, n) = Iterators.peel(Iterators.drop(itr, n-1))
julia> @btime nth($itr, 9999);
7.086 μs (0 allocations: 0 bytes)
but at the same time it is just a one-liner so I'm not sure it is worth it
I actually think that a function such as nth(itr,n)
is more of an endpoint in the lifetime of an iterator.
Therefore, when you are calling nth
you get the end result and not the continuation of the iterator. Plus it matched the intuitive action of "get me the nth element", without forcing the user to deal with the rest
or status
at every callsite of the nth
function. Following a bit the principle of least surprise.
For many intents and purposes, I see nth(itr, n)
as a generalisation of the first(itr)
function in Base
:
nth(itr, n) = begin
y = iterate(Base.Iterators.drop(itr, n-1))
ifelse(isnothing(y), nothing, getindex(y, 1))
end
function first(itr)
x = iterate(itr)
x === nothing && throw(ArgumentError("collection must be non-empty"))
x[1]
end
# it could become just this
# (not backward compatibile and slower, i know, it's just to showcase)
first(itr) = nth(itr, 1)
in my opinion the number of lines of code shouldn't matter when talking about APIs, if it's just a one-liner all the better, but it shouldn't be a justification for not putting something in, just for reference, this is the implementation of first(itr, n)
and last(itr, n)
in Base
:
first(itr, n::Integer) = collect(Iterators.take(itr, n))
last(itr, n::Integer) = reverse!(collect(Iterators.take(Iterators.reverse(itr), n)))
Hello,
After searching far and wide both in issues, PR and on the discourse, I could not find any discussion about adding an
Iterators.nth(x, n)
API just for ease of use and simplicity. This is the only other reference about this possibility I could find.I have played a little bit with it in the past during various projects and ended up with a slight evolution over the basic version mentioned by @stevengj in the linked post, which I am carrying around when needed:
which offers the ability to skip bounds checking at the expense of a crash (opposed to just returning nothing).
but that offers decent performance benefits, although we can't escape the
O(n)
complexity without extra assumptions (not that I know of at least)(btw
simple_nth
also errors out when called out of bounds).Instead of straight up opening a PR I wanted to check if there was any desire for this kind of little QOL pieces of code. And more importantly, check with much more knowledgeable people a couple of doubts:
skip_checks
it is possible to "retrofit" the@inbounds
macro to have something like@inbounds Iterators.nth(itr, n)
kind of calls, is that even a good idea?inbounds
version only offers performance benefits in this particular case with a vector, so: is making such distinction even worth at all?