JuliaArrays / LazyArrays.jl

Lazy arrays and linear algebra in Julia
MIT License
303 stars 25 forks source link

Stack overflow for concatenation of large number of arrays #97

Open baggepinnen opened 4 years ago

baggepinnen commented 4 years ago

The current way of constructing Vcat objects, by splatting/slurping, causes stack overflow when concatenating a large number of arrays since special methods of these functions are compiled for the particular number of input arguments.

My usecase is LazyWAVFiles.jl which treats a folder of WAV files as one large array, such a folder may contain quite a lot of files...

baggepinnen commented 4 years ago

I found a way to create the lazy array such that the args tuple got the type NTuple using

ApplyArray{T, N, typeof(vcat), NTuple{length(arrays),L}}(vcat, NTuple{length(arrays),L}(ntuple(i->arrays[i],length(arrays))))
dlfivefifty commented 4 years ago

Would you be able to make a PR?

baggepinnen commented 4 years ago

I could. Would the desired signature be something like

Vcat(::AbstractArray{AbstractArray})

containing essentially

ApplyArray{T, N, typeof(vcat), NTuple{length(arrays),L}}(vcat, NTuple{length(arrays),L}(ntuple(i->arrays[i],length(arrays))))

?

dlfivefifty commented 4 years ago

Vcat(A) is short for ApplyArray(vcat,A), and ApplyArray(f,A) is a lazy version of f(A).

So your proposed signature is not good as it should return something equivalent to vcat(::AbstractArray{<:AbstractArray}), that is have the following behaviour:

julia> vcat([[1,2],[3,4]])
2-element Array{Array{Int64,1},1}:
 [1, 2]
 [3, 4]

Actually, I don't think you want Vcat as if you have a large number of arrays you don't want to work with tuples. The best alternative is probably mortar from BlockArrays.jl:

julia> mortar([[1,2],[3,4]])
2-blocked 4-element BlockArray{Int64,1}:
 1
 2
 ─
 3
 4

Note this is already lazy (in the sense that it doesn't copy its arguments).

baggepinnen commented 4 years ago

Interesting, I was not aware of mortar. My use of LazyArrays to this purpose is working fine though and with the workaround above I have good performance even for very many arrays. I guess we can close this issue, and maybe add a line to the docs hinting at mortar for future users of large numbers of arrays.

dlfivefifty commented 4 years ago

Perhaps you could add your work around to overload:

Vcat(A::AbstractArray...)

?

baggepinnen commented 4 years ago

Methods using varargs are also troublesome if the number of arrays is large since the compiler will hit some limit for the number of arguments to a function. Overloading it does not help, the splatting of the vector of vectors must be avoided. I guess if anyone runs into this issue, they will find this discussion and can make use of the workaround above.