JuliaCollections / IterTools.jl

Common functional iterator patterns
Other
153 stars 29 forks source link

`iterated` performance #40

Closed thjread closed 6 years ago

thjread commented 6 years ago

The iterated iterator is very slow in simple cases due to type instability, and inability to inline the function being iterated.

For example @time IterTools.nth(IterTools.iterated(x -> x + 1, 0.), 10_000_000) gives 2.791875 seconds (50.01 M allocations: 916.153 MiB, 4.42% gc time) on my machine.

One possible fix is to parametrise the Iterated struct by the function being used, so that iterate is specialised for every such function (although I'm new to Julia, so unsure if this would be considered idiomatic or an ugly hack).

For example

struct MyIterated{T, f}
    seed::T
    function MyIterated(f::Function, seed::T) where T
        new{T, f}(seed)
    end
end
Base.IteratorSize(::Type{<:MyIterated}) = Base.IsInfinite()
Base.IteratorEltype(::Type{<:MyIterated}) = Base.EltypeUnknown()

my_iterated(f, seed) = MyIterated(f, seed)

Base.iterate(it::MyIterated) = (it.seed, it.seed)
function Base.iterate(it::MyIterated{T, f}, state) where {T, f}
    newval = f(state)
    return (newval, newval)
end

Then @time IterTools.nth(my_iterated(x -> x + 1, 0.), 10_000_000) gives 0.013682 seconds, which is as fast as a hand-written loop.

iamed2 commented 6 years ago

In general specializing on Function type is okay if you aren't creating too many types. I've leaned towards doing it in IterTools as I think it's important to provide fast implementations at the expense of a bit of compile time. In this case the trade-off is absolutely worth it! However, in Julia we always choose to dispatch on the type of the function, not the function instance itself (each function has a unique type). Here is how your code would look doing it that way (giving the same performance you observed):

struct MyIterated2{T, F}
    func::F
    seed::T
end

Base.IteratorSize(::Type{<:MyIterated2}) = Base.IsInfinite()
Base.IteratorEltype(::Type{<:MyIterated2}) = Base.EltypeUnknown()

my_iterated2(f, seed) = MyIterated2(f, seed)

Base.iterate(it::MyIterated2) = (it.seed, it.seed)
function Base.iterate(it::MyIterated2{T, F}, state) where {T, F}
    newval = it.func(state)
    return (newval, newval)
end

Another Julia tip: do benchmarks with BenchmarkTools.jl (e.g. @btime) to measure more accurately.

I definitely would accept this, would you care to make a PR?

thjread commented 6 years ago

Thanks, that looks good! I'll make a PR