JuliaStats / DataArrays.jl

DEPRECATED: Data structures that allow missing values
Other
53 stars 50 forks source link

Broadcasted not (.!) giving inconsistent return type when DataArray in expression #263

Open joshbode opened 7 years ago

joshbode commented 7 years ago

The broadcasted not operator .! (new in 0.6) is returning results with unexpected type when used directly on expressions containing a DataArray, e.g.

isa(.!isna.(x), DataArray) == true

whereas, if the BitArray result is stored in an intermediate variable:

y = isna.(x)
isa(.!y,  BitArray) == true

Example:

0.6.0-rc3.0> using DataArrays
0.6.0-rc3.0> x = DataArray(1:3);

0.6.0-rc3.0> isna.(x)
3-element BitArray{1}:
 false
 false
 false
# looks correct

0.6.0-rc3.0> .!isna.(x)
3-element DataArrays.DataArray{Bool,1}:
 true
 true
 true
# return has become a DataArray{Bool}

0.6.0-rc3.0> y = isna.(x); .!y
3-element BitArray{1}:
 true
 true
 true
# return is a BitArray as expected

Possible explanation - the code expansion looks rather more complicated than expected:

0.6.0-rc3.0> expand(:(.!f.(x)))
:($(Expr(:thunk, CodeInfo(:(begin
        $(Expr(:thunk, CodeInfo(:(begin
        global ##19#20
        const ##19#20
        $(Expr(:composite_type, Symbol("##19#20"), :((Core.svec)()), :((Core.svec)()), :(Core.Function), :((Core.svec)()), false, 0))
        return
    end))))
        $(Expr(:method, false, :((Core.svec)((Core.svec)(##19#20, Core.Any), (Core.svec)())), CodeInfo(:(begin
        #temp#@_3 = f(#temp#@_2)
        return !#temp#@_3
    end)), false))
        #19 = $(Expr(:new, Symbol("##19#20")))
        SSAValue(0) = #19
        return (Base.broadcast)(SSAValue(0), x)
    end)))))

versus:

0.6.0-rc3.0> expand(:(f.(x)))
:((Base.broadcast)(f, x))

0.6.0-rc3.0> expand(:(.!y))
:((Base.broadcast)(!, y))

which possibly suggests it is an upstream issue, independent of DataArrays.

joshbode commented 7 years ago

Two workarounds:

0.6.0-rc3.0> .!begin isna.(x) end
3-element BitArray{1}:
 true
 true
 true

0.6.0-rc3.0> .!(_ = isna.(x))
3-element BitArray{1}:
 true
 true
 true
ararslan commented 7 years ago

The complex code expansion I believe is expected since this is a fusing dot operation. Note that you get something similar with expand(:(a .+ b .- c)), for example.

As for this bug (and the fact that those workarounds work!), that is bizarre...

joshbode commented 7 years ago

I'm not sure the expansion is correct... it involves the expression-head :composite_type for example, which sounds like the dot might be being interpreted as field access perhaps? Don't know enough about the parser, but looks strange to me.

Also, for your example, the same trick works:

0.6.0-rc3.0> expand(:(a .+ begin b .- c end))
:($(Expr(:thunk, CodeInfo(:(begin
        # meta: location REPL[50] # REPL[50], line 1:
        # meta: pop location
        SSAValue(0) = (Base.broadcast)(-, b, c)
        return (Base.broadcast)(+, a, SSAValue(0))
    end)))))

Reported upstream, in any case - I don't think this is specifically a DataArrays issue.

ararslan commented 7 years ago

Indeed. Thanks for the report. In the meantime I'll see if there's anything we can do here to work around the weird return type.

joshbode commented 7 years ago

No worries. In the meantime, I guess !isna.(x) does work - though !(::BitArray) is now deprecated (which is how I got here in the first place!).

ararslan commented 7 years ago

broadcast(!, isna.(x)) has the right return type (though oddly enough broadcast(!isna, x) does not), so you could do that for now if you want to avoid the deprecation warnings.

joshbode commented 7 years ago

Good idea - thanks

nalimilan commented 7 years ago

I think this is https://github.com/JuliaLang/julia/issues/19313. The problem is that multiple operations using the dot syntax are fused into a single one by the parser, so broadcast is not called on isna, but on x -> !isna(x), and the special-casing to return a BitArray rather than a DataArray{Bool} isn't used.

joshbode commented 7 years ago

Yep - looks like the same issue

ararslan commented 7 years ago

Unfortunately with fused broadcast there doesn't appear to be an easy way to specialize on a combination of functions, so we'll have to find another way to make .!isna.(x) return the desired BitArray...

nalimilan commented 7 years ago

I don't think there's a solution with the current system unfortunately. Except undeprecating isna(x)...

joshbode commented 7 years ago

In the meantime, apart from not being able to entirely control the return vector type, I'm appreciating how awesome the broadcast-dot-syntax and loop fusion is :)

ararslan commented 7 years ago

I don't think we should remove the implicitly vectorized isna deprecation, since then we're in conflict with how things are done in Base. Until we can figure something out, be it a change in Base for 0.7 or some way to work around it in 0.6, we can just recommend map(!, isna.(x)) (equivalent and shorter than broadcast(!, isna.(x))), I guess. :/

nalimilan commented 7 years ago

I don't think undeprecating vectorized isna is a real problem. Thanks to the deprecation, all places which could use the new syntax have been changed now, and people are familiar with it. So we could silently reintroduce it for this problematic case, and deprecate it after 0.7 will have provided a way to fix this. broadcast(!, isna.(x))) is kind of hard to discover.