JuliaML / TableTransforms.jl

Transforms and pipelines with tabular data in Julia
https://juliaml.github.io/TableTransforms.jl/stable
MIT License
103 stars 15 forks source link

ColSpec is broken for ZScore #157

Closed juliohm closed 1 year ago

juliohm commented 1 year ago

MWE:

using TableTransforms

(a=rand(Int,3), b=rand(3)) |> ZScore(:b)

AssertionError: columns must hold continuous variables

assert_continuous(::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{Float64}}})@assertions.jl:8
applyfeat(::TableTransforms.ZScore{TableTransforms.NameSpec}, ::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{Float64}}}, ::Nothing)@transforms.jl:175
apply@transforms.jl:131[inlined]
Transform@interface.jl:84[inlined]
|>(::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{Float64}}}, ::TableTransforms.ZScore{TableTransforms.NameSpec})@operators.jl:911
top-level scope@[Local: 1](http://localhost:1234/edit?id=ad9e29a0-74c7-11ed-38ee-0ffbf894998f#)[inlined]
vickydeka commented 1 year ago

I think this is causing an error because the ZScore function only works on continuous variables and Int is a categorical variable.

Maybe the transform function can be use to apply the ZScore transformation only to the b column of the NamedTuple, and to remove the a column from the result.

using TableTransforms

(a=rand(Int,3), b=rand(3)) |> transform(:b, ZScore)
juliohm commented 1 year ago

The root of the issue is known, the fallback method for all ColwiseTransform types is performing assertions in the input table instead of in the selection.

Do you want to give it a try and fix it @vickydeka ? Here is the relevant function to modify:

https://github.com/JuliaML/TableTransforms.jl/blob/976ce71bf7dad543d70d18120323126fad3395d3/src/transforms.jl#L172-L208

vickydeka commented 1 year ago

@juliohm I would definitely like to give it a try!

juliohm commented 1 year ago

Fixed by #160