JuliaAI / ScientificTypes.jl

An API for dispatching on the "scientific" type of data instead of the machine type
MIT License
96 stars 8 forks source link

Tables that are wrapped `CategoricalMatrix` do not have correct scitype #146

Closed ablaom closed 3 years ago

ablaom commented 3 years ago

Here is some expected behaviour:

julia> using Tables

julia> X = rand(5, 3) |> Tables.table
Tables.MatrixTable{Matrix{Float64}} with 5 rows, 3 columns, and schema:
 :Column1  Float64
 :Column2  Float64
 :Column3  Float64

julia> scitype(X)
Table{AbstractVector{Continuous}}

But if we switch to categorical data, we get the wrong scitype:

julia> X = coerce(rand("abc", 5, 3), Multiclass) |> Tables.table
Tables.MatrixTable{CategoricalArrays.CategoricalMatrix{Char, UInt32, Char, CategoricalArrays.CategoricalValue{Char, UInt32}, Union{}}} with 5 rows, 3 columns, and schema:
 :Column1  CategoricalArrays.CategoricalValue{Char, UInt32}
 :Column2  CategoricalArrays.CategoricalValue{Char, UInt32}
 :Column3  CategoricalArrays.CategoricalValue{Char, UInt32}

julia> schema(X)
┌─────────┬────────────────────────────────┬───────────────┐
│ _.names │ _.types                        │ _.scitypes    │
├─────────┼────────────────────────────────┼───────────────┤
│ Column1 │ CategoricalValue{Char, UInt32} │ Multiclass{3} │
│ Column2 │ CategoricalValue{Char, UInt32} │ Multiclass{3} │
│ Column3 │ CategoricalValue{Char, UInt32} │ Multiclass{3} │
└─────────┴────────────────────────────────┴───────────────┘
_.nrows = 5

julia> scitype(X)
Table{AbstractMatrix{Multiclass{3}}}

Was expecting Table{AbstractVector{Multiclass{3}}} here!