Open aplavin opened 2 years ago
ArraysOfArrays is fundamentally broken: https://github.com/JuliaArrays/ArraysOfArrays.jl/issues/2. That can cause all sorts of trouble.
Interesting, you mean the x[1] <: eltype(x)
not holding in ArraysOfArrays? Indeed, that's unfortunate, didn't notice before.
However, this issue is independent and will remain even when the ArraysOfArrays eltype is fixed so that x[1] <: eltype(x)
.
ArrayOfArrays can push!
any iterable, such as a range, directly: it's significantly more efficient than materializing the iterable to an Array. Meanwhile, StructArrays eagerly attempts to convert the passed value to the eltype of component arrays.
I believe the latter is the actual issue, StructArrays shouldn't decide for all underlying arrays that an explicit conversion is needed.
I am also realizing that things are problematic for array types where the result of indexing is some custom type that has a reference to data from the original array, For example the same happens with StringArray{WeakRefString}
from WeakRefStrings:
julia> using WeakRefStrings
julia> sa = StringArray{WeakRefString}(["x", "y"])
2-element StringVector{WeakRefString}:
"x"
"y"
julia> push!(sa, "z")
3-element StringVector{WeakRefString}:
"x"
"y"
"z"
julia> push!(sa, convert(eltype(sa), "z"))
ERROR: MethodError: no method matching WeakRefString(::String)
CategoricalArrays have the same issue: you can push!
things that cannot be converted to the eltype
of the array.
I imagine one would need to pick up between
1
to a StructArray{ComplexF64}
goes back to erroringpush!
values that cannot be converted to the eltype
Neither choice seems great, but I don't really see a clean way out of this. Maybe something along the lines of #229 is a reasonable compromise, but even then things are only fixed when the entry is passed as a tuple or named tuple.
The reason is that push! does not recurrently call push! if some columns are StructArrays, but the whole thing is unrolled, for performance reasons.
Maybe at some point this unrolling wouldn't be required? Maybe even at current julia versions? :) Not sure about exact benchmark, but this can be a way forward to fixing this issue.
The eager automatic conversion introduced in #227 (cc @timholy @piever) leads to many allocations when this conversion isn't actually needed. For example: StructArrays@0.6.7 - 25 allocations:
StructArrays@0.6.8 - 1 million allocations:
Of course, these allocations go away if automatic conversion (https://github.com/JuliaArrays/StructArrays.jl/blob/1581d70090de650646277e4eee4e33794e55b9ac/src/utils.jl#L197-L199) is disabled by
StructArrays.maybe_convert_elt(::Type{T}, vals::Tuple) where {T} = vals
.This change in 0.6.8 is highly breaking performance-wise. Can the new automatic conversion approach be revised and made less eager somehow?