Open tkf opened 6 years ago
Great idea and thanks for the jumpstart! I'd love a PR.
I think it might be good to enforce the restriction that indices be strictly increasing. Perhaps just error if !issorted(slice)
? Python gets around this by doing the equivalent of only accepting StepRange{<:Integer, <:Integer}
.
done
should not call next
and advance the iterator. As a general rule, done
should be cheap and simple. It's okay for next
to return anything if done
has returned true
though, if that makes it easier.
Technically iteratorsize
is IsInfinite
if both iter
and slice
are IsInfinite
, and SizeUnknown
otherwise.
Don't forget eltype
:)
slice
has a particular meaning in Python which doesn't exist in Julia. I think we should choose another name for this. Brainstorming:
pick
index
takeat
takeindex
atindex
Another thought; this, nth
, and takenth
are fairly similar. I wonder if we can combine them.
Thanks for the feedback!
done should not call next and advance the iterator.
I don't think it's possible to do that here. That was actually my first try, but it didn't work:
struct Slice{I, S}
iter::I
slice::S
end
const slice = Slice
function Base.start(it::Slice)
return (0, 0, start(it.iter), start(it.slice))
end
Base.done(it::Slice, ::Void) = true
Base.done(it::Slice, state) = done(it.slice, state[4]) ||
done(it.iter, state[3])
function Base.next(it::Slice, state)
local x
n, k_prev, si, ss = state
k, ss = next(it.slice, ss)
if k <= k_prev
if k_prev < 1
error("Slicing index starts with a number smaller than 1.")
end
error("Slicing index is not strictly increasing ($k_prev -> $k).")
end
# @assert n < k
while n < k
if done(it.iter, si)
# return (????, (n, k, si, ss))
return nothing # cannot return any useful result here
end
x, si = next(it.iter, si)
n += 1
end
return (x, (n, k, si, ss))
end
Base.iteratorsize(::Type{<: Slice}) = Base.SizeUnknown()
using Base.Test
@show collect(slice(1:100, 1:2:5))
@show collect(1:2:5)
@test collect(slice(1:100, 1:2:5)) == collect(1:2:5)
@test collect(slice(1:5, 1:2:10)) == collect(1:2:5)
# This doesn't work:
@test collect(slice(1:10, [1, 3, 7, 11])) == [1, 3, 7, 11]
The problem is that Slice
iterator can't know when to stop unless advancing underlying Slice.iter
. See if done(it.iter, si)
block inside the while
loop.
As a general rule, done should be cheap and simple.
I agree that a cheap done
is the best choice if possible (which is the most of the cases), but it is not possible in some cases, including this one. It seems that this is the common approach for Julia < 1.0:
There's a fundamental awkwardness to using the start/next/done protocol to iterate things like streams or tasks that can't know if they're done or not until they've tried to get the next item. Currently, we work around this mismatch by advancing the iterator in done instead of in next.
--- StefanKarpinski in https://github.com/JuliaLang/julia/issues/8149
Hopefully, we can get rid of this sub-optimal solution in Julia 1.0: https://github.com/JuliaLang/julia/issues/18823
But until Julia 1.0, this is all we can do, I think. Is it OK to advance iterators in done
?
Technically iteratorsize is IsInfinite if both iter and slice are IsInfinite, and SizeUnknown otherwise.
Ah, good catch. Thank. I'll fix that when I'm sending a RP.
More brainstorming for the name:
iview
(iterating view)itake
iread
viewing
taking
reading
at
(maybe too short?)takeseq
takeiter
I like viewing
since we already have Base.view
for arrays. But it's not exactly similar to Base.view
since obviously you can't do setindex!
to iterator. It's more like a read-only view (and so I added iread
/reading
). But still viewing
looks nice..
From your suggestions, I think pick
is good. index
is also nice when you use it in the macro form: @index iter[5:3:end]
.
Another thought; this, nth, and takenth are fairly similar. I wonder if we can combine them.
Yeah I agree. I think we can dispatch on the type of the second argument (indices or an index) but I think we can only combine two, like this:
Combining "slice" and nth
:
f1(iter, indices) # == "slice"(iter, indices)
f1(iter, index::Integer) # == nth(iter, index)
Combining "slice" and takenth
:
f2(iter, indices) # == "slice"(iter, indices)
f2(iter, stride::Integer) # == takenth(iter, stride)
I think choosing one of the two cases narrows possible set of names. For example, the name viewing
and the interface f1
"slice"-&-nth
do not work well. But all of your names seem to work for both interfaces.
If what you mean by "combine them" is just reusing the implementation, we can do it after a more efficient specific implementation for indices::StepRange
is done. But I'd like to do a general implementation for a generic iterator first.
Come to think of it, I just realize I don't understand why takenth
works without the next
-in-done
hack. Reading the code, I noticed I was making a stronger assumption for how IterTools should work. That is to say, I thought takenth
tries to not advance the underlying iterator as much as possible. But it was not the case. That is to say, the code
for (i, x) in enumerate(iter)
if i % 2 == 1
f(iter, x)
end
end
and
for x in takenth(iter, 2)
f(iter, x)
end
are not equivalent if iter
is stateful. For example, the code below does not work:
using Base.Test
using IterTools
buf = IOBuffer(join(1:10, "\n"))
for (i, line) in enumerate(takenth(eachline(buf), 2))
@show i, line
if i > 2
break
end
end
@test readline(buf) == "7"
It is common for stateful algorithm to provide an iterator interface (e.g., DifferentialEquations.jl) and it would be nice if IterTools.jl can interact well with such algorithms. For this, IterTools has to have a strong guarantee that it advances underlying iterator only at the very last moment. It means to advance iterators in done
for Julia 0.6. But it will be easier to do in 1.0.
Do you think this stronger guarantee on how IterTools advances underlying iterators makes sense? Do you think the next
-in-done
hack is OK for Julia 0.6?
So I implemented iteratorsize/iteratoreltype/eltype and also length for the case indices are from OrdinalRange: https://gist.github.com/tkf/16b06fd6a4943c8d524295152c231b13
I can send a PR with this code (with some addition of docstring) if next-in-done hack is OK and the name is decided.
Can you come up with an example you're interested in where your implementation is faster than this one:
function myviewing(itr, inds)
imap(last, Iterators.filter((x)->first(x) in inds, enumerate(itr)))
end
It's faster in most cases. For very large inds
we could reduce search time by checking issorted
and using searchsorted
, or allocating a Set
.
My implementation is doing mutation on fields of Any type. It's certainly not aimed for performance, at least at the moment. It's more about implementing it correctly. But I think faster implementation is possible in Julia 1.0.
The filter
-based implementation doesn't work when inds
is non-terminating. I don't know when my implementation becomes faster compared to it. It is always possible to "dispatch" to the filter
-based implementation when inds
is finite (and sufficiently small?).
Also, myviewing
does not satisfy the "stronger guarantee" I was emphasizing:
using Base.Test
using IterTools
function myviewing(itr, inds)
imap(last, Iterators.filter((x)->first(x) in inds, enumerate(itr)))
end
@testset "myviewing" begin
buf = IOBuffer(join(1:10, "\n"))
inds = [1, 3, 5]
for (i, line) in zip(inds, myviewing(eachline(buf), inds))
@test string(i) == line
end
@test_broken readline(buf) == "6"
end
But this is how Iterators.filter
behaves in 0.6. I wonder if Julia devs want to fix the semantics of it.
Is there a way to slice/index arbitrary iterator in Julia? Can you add a function ("
slice
") that can do that? It's like Python'sitertools.islice
and it doesbut it also works with infinite iterator and indices, provided that indices are strictly increasing (otherwise you need to cache the whole history). I actually coded up a quick implementation:
Side notes: Besides obvious lack of documentation and good error message (rather than
@assert
), the most smelly part of the code is the mutableSliceState
and the calls to thenext
of the underlying iterators. But it seems that this is the usual way to solve this kind of problem for Julia <= 0.6. I guess it's going to be easier with the new iteration protocol for Julia 1.0.Can IterTools.jl and/or Base already do it? Am I missing something? If not, do you think it is a good API to add in IterTools.jl? I can make a PR with a bit more refined code if you like.