JuliaStats / DataArrays.jl

DEPRECATED: Data structures that allow missing values
Other
53 stars 50 forks source link

Inference failure/indexing issues #276

Closed bramtayl closed 7 years ago

bramtayl commented 7 years ago

Maybe a DataFrames issue?

julia> d = DataFrame(a = [1, 2]);

julia> d[d[:a] .> 1, :]
1×1 DataFrames.DataFrame
│ Row │ a │
├─────┼───┤
│ 1   │ 2 │

julia> d[map(a -> a > 1, d[:a]), :]
ERROR: MethodError: no method matching getindex(::DataFrames.DataFrame, ::DataArrays.DataArray{Any,1}, ::Colon)
bramtayl commented 7 years ago

Ref

JeffBezanson commented 7 years ago

It looks like here: https://github.com/JuliaStats/DataArrays.jl/blob/513c046836f4e066feece28671be1c1acafa2329/src/datavector.jl#L106 map always returns an Any array.

ararslan commented 7 years ago

The difference is that broadcast is correctly inferring the element type to be a subtype of Real (in this case Bool), so it's going through this method. For whatever reason, map is incorrectly inferring the element type to be Any, and there isn't a method for that.

ararslan commented 7 years ago

Ugh, why did we define it like that...

ararslan commented 7 years ago

Okay well I'll work on a fix. Thanks for the heads up, @JeffBezanson.

JeffBezanson commented 7 years ago

I believe the default definition of map in Base works here:

julia> invoke(map, (Any,AbstractArray), x->x>1, a)
2-element DataArrays.DataArray{Bool,1}:
 false
  true

It uses similar to get a container of the same type, and iterator traits, plus the current officially-sanctioned tricks for getting the right element type. I spent a lot of time on it... :)

bramtayl commented 7 years ago

Should there also be a subsetting method defined for DataArrays of eltype any?

ararslan commented 7 years ago

no

bramtayl commented 7 years ago

There's also this inference failure:

a = @data [1, 2]
broadcast(x -> x > 1, a)
broadcast(x -> x > mean(a), a)::DataArrays.DataArray{Any,1}

a = [1, 2]
broadcast(x -> x > 1, a)
broadcast(x -> x > mean(a), a)::BitArray{1}
bramtayl commented 7 years ago

This infers correctly though. So I guess the compiler doesn't know mean is a pure function? Mean is indeed a pure function, right?

a = @data [1, 2]
Base.@pure mymean(x) = mean(x)
broadcast(x -> x > mymean(a), a)
nalimilan commented 7 years ago

@bramtayl I don't think purity is the right solution here. @pure should only be used in very specific cases. It seems that the fact that mean(a) has an inferred return type of Union{NAtype, Float64} is an issue for the compiler. Could you file a separate issue?

bramtayl commented 7 years ago

A dataarrays issue?

nalimilan commented 7 years ago

Yes.

nalimilan commented 7 years ago

See https://github.com/JuliaStats/DataArrays.jl/issues/282.