JuliaCollections / IterTools.jl

Common functional iterator patterns
Other
154 stars 29 forks source link

Exponential time complexity for recursive chains with increasing NTuple length #29

Closed nkemnitz closed 6 years ago

nkemnitz commented 6 years ago

Hi all,

there is some strange exponential time and memory behavior that I don't understand: I have a tree with some additional data (Vector{Foo}) on every node, and eventually I want to iterate through all the Foo objects without unnecessary allocations. At first I thought I ran into an infinite loop, because CPU got stuck at 100% and nothing else would happen, but I was able to track down the problem to the following minimal example. I'll just use a linear list instead of a wider tree here, and run length() instead of collecting items. That is enough to show the exponential behavior:

versioninfo()

using IterTools

struct Foo end

"""
Creates a Chain{Tuple{Chain{Tuple{ ... NTuple{n,Vector{Foo}} ... }}}} of depth `d`,
with an NTuple of length `n` on the lowest level, containing empty Vectors of `Foo`.
"""
function g(d::Int, n::Int)
  if d == 0
    return [Foo()]
  elseif d == 1
    return chain([g(d - 1, n) for i in 1:n]...)
  else
    return chain(g(d - 1, n))
  end
end

for i in 1:19
  @time length(g(5, i))
end

Result for the above example (d=5, n=19):

Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

  0.184802 seconds (123.40 k allocations: 6.813 MiB, 26.74% gc time)
  0.115444 seconds (123.95 k allocations: 6.828 MiB)
  0.115224 seconds (129.93 k allocations: 7.195 MiB)
  0.175451 seconds (136.42 k allocations: 7.585 MiB, 31.63% gc time)
  0.120656 seconds (143.94 k allocations: 8.109 MiB)
  0.124801 seconds (153.71 k allocations: 8.956 MiB)
  0.176149 seconds (168.00 k allocations: 10.415 MiB, 28.02% gc time)
  0.141948 seconds (191.84 k allocations: 13.889 MiB, 2.80% gc time)
  0.144715 seconds (235.74 k allocations: 20.443 MiB)
  0.167659 seconds (321.68 k allocations: 37.108 MiB, 3.56% gc time)
  0.208135 seconds (495.73 k allocations: 69.814 MiB, 3.25% gc time)
  0.300323 seconds (854.18 k allocations: 138.819 MiB, 3.42% gc time)
  0.487694 seconds (1.60 M allocations: 307.199 MiB, 3.67% gc time)
  0.874714 seconds (3.14 M allocations: 639.010 MiB, 3.37% gc time)
  1.708150 seconds (6.36 M allocations: 1.311 GiB, 3.37% gc time)
  3.567857 seconds (13.05 M allocations: 3.163 GiB, 3.62% gc time)
  7.628431 seconds (26.95 M allocations: 6.670 GiB, 3.52% gc time)
 16.260810 seconds (55.80 M allocations: 14.041 GiB, 3.78% gc time)
 34.626086 seconds (115.57 M allocations: 29.456 GiB, 4.18% gc time)

If I run the same loop again, the times are suddenly what I would've expected in the first place:

  0.000087 seconds (27 allocations: 992 bytes)
  0.000060 seconds (29 allocations: 1.094 KiB)
  0.000045 seconds (31 allocations: 1.219 KiB)
  0.000047 seconds (33 allocations: 1.344 KiB)
  0.000049 seconds (35 allocations: 1.469 KiB)
  0.000047 seconds (37 allocations: 1.594 KiB)
  0.000042 seconds (39 allocations: 1.719 KiB)
  0.000041 seconds (41 allocations: 1.844 KiB)
  0.000045 seconds (43 allocations: 1.969 KiB)
  0.000042 seconds (45 allocations: 2.094 KiB)
  0.000040 seconds (47 allocations: 2.219 KiB)
  0.000041 seconds (49 allocations: 2.344 KiB)
  0.000040 seconds (51 allocations: 2.469 KiB)
  0.000039 seconds (53 allocations: 2.594 KiB)
  0.000042 seconds (55 allocations: 2.719 KiB)
  0.000046 seconds (57 allocations: 2.844 KiB)
  0.000048 seconds (59 allocations: 2.969 KiB)
  0.000044 seconds (61 allocations: 3.094 KiB)
  0.000045 seconds (63 allocations: 3.219 KiB)

I guess my question is: Why is that? And how can I fix it efficiently? I could change chain to vcat, but I was hoping to avoid any unnecessary copies/allocations.

iamed2 commented 6 years ago

I know why this is happening but not why it's so bad. What you're seeing is compilation time/cost.

For each size of tuple it's compiling a new version of chain, but I think that maybe since you're calling it recursively with splatting in a loop it's recompiling the same functions over and over. I don't think it's your fault though. It's either something I need to fix in the chain code or some kind of compiler issue (which probably has a workaround).

I'll look into this at the end of this week but you might be better off using Iterators.flatten in the meantime.

nkemnitz commented 6 years ago

Oh, thanks for the quick response and the flatten suggestion! Will try that one. Fun fact: At least for the above example, if I set d to a higher value than 5, the weird behavior does not occur, either... it might really not be an IterTools thing...

iamed2 commented 6 years ago

Hmm yeah sounds like the impact of some sort of heuristic limit controlling compilation of recursive functions.