Open davidagold opened 9 years ago
The tweaks in https://github.com/johnmyleswhite/NullableArrays.jl/commit/76309a08b09b57e47cbfb23bb1a6243c18abfe9b#diff-88161f25d2abf2976952b3a2c500b755 appear to have closed some of the performance gaps that can be seen above. The remaining gaps seem restricted to cases in which a mapreduce-related method is called on a NullableArray{T}
with empty values with skipnull=false
. These gaps could be closed if we decide that, in such cases, it is okay simply to return Nullable{T}()
instead of letting the type of the resultant empty Nullable
object be determined by mapreducing without skipping.
IMO both approaches to determining the type of the empty Nullable
to be returned are somewhat arbitrary. One the one hand, and for instance, applying such a method to an X::NullableArray{Int}
might almost certainly return a Nullable{Float}
if there are no null entries/if null entries are skipped. In this case, returning Nullable{Int}()
may not seem appropriate. However, one could also imagine a case in which present values of X
are so constrained that applying a different mapreduce method would always return a Nullable{Int}
if there are no null entries/if null entries are skipped. In this case, letting the parameter of the returned Nullable
object be determined by whatever values to which "empty" entries in the X.values
field happen to be initialized seems entirely whimsical.
I'm slightly more of the mind that such cases ought to return Nullable{T}()
and avoid determining the type via lifted computation. I think this approach has the virtues of predictability and performance, at least with respect to actually returning the empty Nullable
result from the mapreduce method. If users require another behavior, they can annotate the returned object.
Most recent run:
julia> profile_reduce_methods()
Method: mapreduce(f, op, A) (0 missing entries)
for Array{Float64}: 297.994 milliseconds (15008 k allocations: 229 MB, 19.00% gc time)
for NullableArray{Float64}: 302.860 milliseconds (15008 k allocations: 229 MB, 18.29% gc time)
for DataArray{Float64}: 327.860 milliseconds (15008 k allocations: 229 MB, 17.56% gc time)
Method: mapreduce(f, op, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 498.262 milliseconds (15008 k allocations: 458 MB, 28.21% gc time)
for DataArray{Float64}: 277.049 milliseconds (5009 k allocations: 78259 KB, 7.22% gc time)
Method: mapreduce(f, op, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 237.232 milliseconds (7542 k allocations: 115 MB, 11.15% gc time)
for DataArray{Float64}: 224.135 milliseconds (7517 k allocations: 115 MB, 14.12% gc time)
Method: reduce(f, op, A) (0 missing entries)
for Array{Float64}: 2.716 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 5.723 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 2.728 milliseconds (2 allocations: 96 bytes)
Method: reduce(f, op, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 6.121 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 4.340 microseconds (1 allocation: 80 bytes)
Method: reduce(f, op, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 52.310 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 52.839 milliseconds (3 allocations: 176 bytes)
Method: sum(A) (0 missing entries)
for Array{Float64}: 3.348 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 7.068 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 3.102 milliseconds (2 allocations: 96 bytes)
Method: sum(f, A) (0 missing entries)
for Array{Float64}: 317.245 milliseconds (15008 k allocations: 229 MB, 18.68% gc time)
for NullableArray{Float64}: 295.854 milliseconds (15008 k allocations: 229 MB, 18.23% gc time)
for DataArray{Float64}: 327.310 milliseconds (15008 k allocations: 229 MB, 17.98% gc time)
Method: prod(A) (0 missing entries)
for Array{Float64}: 8.554 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 12.285 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 11.440 milliseconds (2 allocations: 96 bytes)
Method: prod(f, A) (0 missing entries)
for Array{Float64}: 302.056 milliseconds (15008 k allocations: 229 MB, 16.97% gc time)
for NullableArray{Float64}: 320.901 milliseconds (15008 k allocations: 229 MB, 18.20% gc time)
for DataArray{Float64}: 331.193 milliseconds (15008 k allocations: 229 MB, 15.79% gc time)
Method: minimum(A) (0 missing entries)
for Array{Float64}: 3.910 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 6.827 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 4.421 milliseconds (2 allocations: 96 bytes)
Method: minimum(f, A) (0 missing entries)
for Array{Float64}: 278.363 milliseconds (10000 k allocations: 153 MB, 13.22% gc time)
for NullableArray{Float64}: 282.217 milliseconds (10000 k allocations: 153 MB, 12.66% gc time)
for DataArray{Float64}: 283.047 milliseconds (10000 k allocations: 153 MB, 12.57% gc time)
Method: maximum(A) (0 missing entries)
for Array{Float64}: 4.610 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 6.200 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 4.316 milliseconds (2 allocations: 96 bytes)
Method: maximum(f, A) (0 missing entries)
for Array{Float64}: 300.411 milliseconds (10000 k allocations: 153 MB, 13.01% gc time)
for NullableArray{Float64}: 309.668 milliseconds (10000 k allocations: 153 MB, 12.73% gc time)
for DataArray{Float64}: 341.655 milliseconds (10000 k allocations: 153 MB, 12.69% gc time)
Method: sum(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 6.230 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 4.670 microseconds (1 allocation: 80 bytes)
Method: sum(f, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 504.233 milliseconds (15008 k allocations: 458 MB, 28.99% gc time)
for DataArray{Float64}: 286.009 milliseconds (5009 k allocations: 78259 KB, 7.37% gc time)
Method: prod(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 24.736 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 24.600 microseconds (1 allocation: 80 bytes)
Method: prod(f, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 485.244 milliseconds (15008 k allocations: 458 MB, 30.08% gc time)
for DataArray{Float64}: 287.764 milliseconds (5009 k allocations: 78259 KB, 7.06% gc time)
Method: minimum(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 13.752 microseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 3.793 microseconds (1 allocation: 80 bytes)
Method: minimum(f, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 334.809 milliseconds (7501 k allocations: 191 MB, 22.37% gc time)
minimum(f, A::DataArray) currently incurs error
Method: maximum(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 39.744 microseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 5.082 microseconds (1 allocation: 80 bytes)
Method: maximum(f, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 317.843 milliseconds (7501 k allocations: 191 MB, 22.94% gc time)
maximum(f, A::DataArray) currently incurs error
Method: sum(A) (~half missing entries, skip=true)
for NullableArray{Float64}: 49 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 53.485 milliseconds (3 allocations: 176 bytes)
Method: sum(f, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 236.735 milliseconds (7542 k allocations: 115 MB, 11.03% gc time)
for DataArray{Float64}: 208.877 milliseconds (7517 k allocations: 115 MB, 14.66% gc time)
Method: prod(A) (~half missing entries, skip=true)
for NullableArray{Float64}: 53.514 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 52.948 milliseconds (3 allocations: 176 bytes)
Method: prod(f, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 213.132 milliseconds (7501 k allocations: 114 MB, 12.32% gc time)
for DataArray{Float64}: 205.867 milliseconds (7501 k allocations: 114 MB, 16.28% gc time)
Method: minimum(A) (~half missing entries, skip=true)
for NullableArray{Float64}: 55.691 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 49.199 milliseconds (3 allocations: 176 bytes)
Method: minimum(f, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 220.994 milliseconds (7501 k allocations: 114 MB, 13.32% gc time)
for DataArray{Float64}: 215.215 milliseconds (7501 k allocations: 114 MB, 14.37% gc time)
Method: maximum(A) (~half missing entries, skip=true)
for NullableArray{Float64}: 59.045 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 52.982 milliseconds (3 allocations: 176 bytes)
Method: maximum(f, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 209.707 milliseconds (7501 k allocations: 114 MB, 13.96% gc time)
for DataArray{Float64}: 225.224 milliseconds (7501 k allocations: 114 MB, 12.86% gc time)
Method: sumabs(A) (0 missing entries)
for Array{Float64}: 2.486 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 4.855 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 2.606 milliseconds (2 allocations: 96 bytes)
Method: sumabs2(A) (0 missing entries)
for Array{Float64}: 8.069 milliseconds (1 allocation: 16 bytes)
for NullableArray{Float64}: 5.849 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 5.690 milliseconds (2 allocations: 96 bytes)
Method: sumabs(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 31.779 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 4.673 microseconds (1 allocation: 80 bytes)
Method: sumabs2(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 28.866 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 4.015 microseconds (1 allocation: 80 bytes)
Method: sumabs(A) (~half missing entries, skip=true)
for NullableArray{Float64}: 50.623 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 56.097 milliseconds (3 allocations: 176 bytes)
Method: sumabs2(A) (~half missing entries, skip=true)
for NullableArray{Float64}: 57.878 milliseconds (3 allocations: 192 bytes)
for DataArray{Float64}: 56.935 milliseconds (3 allocations: 176 bytes)
I'm not sure I completely understand the problem, but this sounds similar to the question of the type to return from a comprehension on an empty iterator input. One of the solutions discussed there is to return an Array{Union()}
:
https://github.com/JuliaLang/julia/issues/7258#issuecomment-82602572
https://github.com/JuliaLang/julia/pull/11034
Would it make sense to return a Nullable{Union()}
here?
That issue does seem quite similar, insofar as it too concerns what to do with the eltype
of an empty container returned from some computation. But the present issue is somewhat different. Let me try to make it more explicit.
Currently, lifted binary operators such as +(x::Nullable{S1}, y::Nullable{S2})
will return Nullable(x.value + y.value, x.isnull | y.isnull)
if S1
and S2
are isbits
types, and will throw an error otherwise. Thus we obtain the following behavior:
julia> Y = NullableArray(Int, 5)
5-element NullableArrays.NullableArray{Int64,1}:
Nullable{Int64}()
Nullable{Int64}()
Nullable{Int64}()
Nullable{Int64}()
Nullable{Int64}()
julia> Y[2:5] = [2:5...]
4-element Array{Int64,1}:
2
3
4
5
julia> Y
5-element NullableArrays.NullableArray{Int64,1}:
Nullable{Int64}()
Nullable(2)
Nullable(3)
Nullable(4)
Nullable(5)
julia> reduce(+, Y)
Nullable{Int64}()
julia> ans.value
12884901905
This is fine, since you oughtn't to expect a meaningful (or even safe) answer from retrieving the value
field of a null Nullable
. However, it does mean that, depending on the nature of the reducing operation op
, the "value" of a null entry can determine the type of the empty Nullable
object returned from reducing op
over a NullableArray
:
julia> op(x::Nullable, y::Nullable) = Nullable(x.value + y.value < 10 ? 5 : 5.0, x.isnull | y.isnull)
op (generic function with 1 method)
julia> X = NullableArray([1:5...], [fill(false, 4); true])
5-element NullableArrays.NullableArray{Int64,1}:
Nullable(1)
Nullable(2)
Nullable(3)
Nullable(4)
Nullable{Int64}()
julia> reduce(op, X)
Nullable{Float64}()
julia> X.values[5] = 1
1
julia> X[5]
Nullable{Int64}()
julia> reduce(op, X)
Nullable{Int64}()
Arguably, any user who writes such a whimsical function probably deserves equally whimsical behavior when reducing with that function. That being said, I think this behavior is less than desirable, since it is reasonable to expect that computing the return type of an empty Nullable
oughtn't to depend on the values of null Nullable
s involved in the computation.
Note that this issue only concerns instances where one calls a reducing method with skipnull=false
, which is the default. If skipnull
is set to true, then the "values" of null entries don't affect any aspect of the resultant Nullable
object.
One alternative is to forego the reducing computations entirely in the presence of null entries and simply return Nullable{T}()
, where T
is computed in a way that does not involve the values of null entries. Obvious candidates are to set T
equal to
eltype(X)
Base.return_types(f, (eltype(X), eltype(X)))
eltype(reduce(op, X, skipnull=true))
(i.e. the eltype
of the non-null Nullable
that would have been returned had reduce
been called with skipnull=true
) Union{}
Thoughts about these options:
eltype(X)
totally ignores the fact that the elements of X
are being reduced.Nullable
s incurs all of the problems mentioned in the issue that @nalimilan referenced.Union{}
is that it may introduce performance problems down the line and it also doesn't capitalize on the information that is available about X
and op
. The latter point is also the reason for which the issue concerning comprehensions over empty arrays isn't entirely analogous.Some more thoughts:
mapreduce
, since one can imagine equally pathological functions passed for the map
part.NA
. That's why the current performance discrepancies show up in methods like:Method: prod(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 24.736 milliseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 24.600 microseconds (1 allocation: 80 bytes)
eltype(X)
. I've implemented this where possible, henceMethod: minimum(A) (~half missing entries, skip=false)
for NullableArray{Float64}: 13.752 microseconds (2 allocations: 112 bytes)
for DataArray{Float64}: 3.793 microseconds (1 allocation: 80 bytes)
Method: sum(f, A) (~half missing entries, skip=false)
for NullableArray{Float64}: 504.233 milliseconds (15008 k allocations: 458 MB, 28.99% gc time)
for DataArray{Float64}: 286.009 milliseconds (5009 k allocations: 78259 KB, 7.37% gc time)
and
Method: sum(f, A) (~half missing entries, skip=true)
for NullableArray{Float64}: 236.735 milliseconds (7542 k allocations: 115 MB, 11.03% gc time)
for DataArray{Float64}: 208.877 milliseconds (7517 k allocations: 115 MB, 14.66% gc time)
seems to provide good reason at least to adopt option 3 for now, since, at least based on these numbers, this ought to perform much better than the current implementation and possibly on a par with the DataArrays
implementation.
Here are my results from running the profiling methods included in https://github.com/johnmyleswhite/NullableArrays.jl/blob/master/perf/mapreduce.jl:
Though more than for
broadcast
, there is still relatively little specialized code formapreduce
other specialized methods for skipping over null entries and hooking them into the generalmapreduce
interface. A few noteworthy items:sumabs
andsumabs2
are 10x faster than they were last week without my having touched them. I suspect this is due to improvements in Base Julia, possibly to do with codegen? Now I'm beginning to understand the importance of rigorously tracking performance against environmental variables, since it would have been interesting to see what caused the speedup.NullableArray
s as it is forDataArray
s, but that speedup is lost by the time get to the exposed interface.NullableArrays
.