JuliaAI / ScientificTypes.jl

An API for dispatching on the "scientific" type of data instead of the machine type
MIT License
96 stars 8 forks source link

tuple/table ambiguity #182

Open ablaom opened 2 years ago

ablaom commented 2 years ago

The scitype of a tuple is intended to be the Tuple of the element scitypes. For example:

julia> scitype((1.0, 4))
Tuple{Continuous, Count}

By this logic, if I create a 1-tuple with a table t as it's single element, then this tuple should have Tuple{scitype(t)}. But this isn't always the case:

t = (x=[1, 2], y=["a", "b"])

julia> scitype(t)
Table{Union{AbstractVector{Count}, AbstractVector{Textual}}}

julia> scitype((t,))
Table{Union{AbstractVector{AbstractVector{Count}}, AbstractVector{AbstractVector{Textual}}}}

The problem is that (t, ) is also a table (with one row):

julia> schema((t,))
┌───────┬─────────────────────────┬────────────────┐
│ names │ scitypes                │ types          │
├───────┼─────────────────────────┼────────────────┤
│ x     │ AbstractVector{Count}   │ Vector{Int64}  │
│ y     │ AbstractVector{Textual} │ Vector{String} │
└───────┴─────────────────────────┴────────────────┘

This is pretty awful 😢 . For example it makes it tricky, in MLJBase, to use the fit_data_scitype of models, to check compatibility of a model with data, as in https://github.com/JuliaAI/MLJBase.jl/pull/731 . That is, the test scitype(data) <: fit_data_scitype(model) where data is the tuple of data arguments, is not reliable.

ablaom commented 2 years ago

cc @pazzo83

pazzo83 commented 2 years ago

Ah so this is why my tests were failing?

ablaom commented 2 years ago

No, I now think that the MLJBase PR is (by accident?) actually avoiding this issue. See https://github.com/JuliaAI/MLJBase.jl/pull/731#issuecomment-1021891466 .

Still this issue could turn up unexpectedly elsewhere.