JuliaStats / NullableArrays.jl

DEPRECATED Prototype of the new JuliaStats NullableArrays package
Other
35 stars 21 forks source link

mapreduce() works, but not map() #83

Open nalimilan opened 9 years ago

nalimilan commented 9 years ago

mapreduce works when no missing values are present in a NullableArray, but map doesn't. None of them work when a missing value is present. Not sure what's the intended behavior.

julia> a = NullableArray([1,2,3])
3-element NullableArray{Int64,1}:
 1
 2
 3

julia> map(exp, a)
ERROR: MethodError: `exp` has no method matching exp(::Nullable{Int64})
 [inlined code] from essentials.jl:58
 in ___F_#21__ at /home/milan/.julia/NullableArrays.jl/src/map.jl:124
 in __map#3__ at /home/milan/.julia/NullableArrays.jl/src/map.jl:172

julia> mapreduce(exp, +, a)
Nullable(30.19287485057736)

julia> b = NullableArray([1,2,3], [false, false, true])
3-element NullableArray{Int64,1}:
     1
     2
 #NULL

julia> mapreduce(exp, +, b)
ERROR: MethodError: `exp` has no method matching exp(::Nullable{Int64})
 [inlined code] from essentials.jl:58
 in _mapreduce at reduce.jl:143
 [inlined code] from /home/milan/.julia/NullableArrays.jl/src/indexing.jl:4
 in _mapreduce at /home/milan/.julia/NullableArrays.jl/src/reduce.jl:78
 in __mapreduce#7__ at /home/milan/.julia/NullableArrays.jl/src/reduce.jl:99
nalimilan commented 9 years ago

I guess a related issue it that the error should mention the lift and skipnull arguments. I can make a PR for that it you like.

davidagold commented 9 years ago

Ah yes, I remember this. I told myself I'd make a note of this and fix it. That didn't happen.

What's going on is that there is a part of the NullableArrays mapreduce interface that checks whether or not any of the entries in the argument array X is null and, if not, passes X.values, along with function/operation to be mapreduced, to the mapreduce interface in Base. My primary intention in that move was to boost performance, but it appears I forgot to add a check for the value of skipnull.

I don't know if we need to change the error if the documentation makes it clear that this behavior is to be expected. This also plays into other conversations about whether or not skipnull should be set to true or false by default.

nalimilan commented 9 years ago

I'd say: 1) Change map to work when no values are missing. 2) Add details to the error message to mention the fact that a missing value was encountered and that skipnull=false. Even if the docs mention this, the user who did not expect missing values in the input, or forgot to pass skipnull, could be confused by the error about a missing method. I guess the fundamental issue here is that map/mapreduce have lifting semantics in some cases, but those of a standard call in others.

davidagold commented 9 years ago

@johnmyleswhite I still don't understand the technical details of custom error messages possibly incurring performance penalties due to heap allocations, so I'm deferring to you on whether or not adding our own error message here is a good/bad idea.

AFAIK, The only case in which any of map, mapreduce or broadcast automatically lift is the case in which one mapreduces over a NullableArray that has zero null entries. I think it may be just as confusing to have map and friends lift by default when there are no null entries but not lift by default when there are null entries -- although this actually is a lifting system that I hadn't ever really thought about.