Closed ablaom closed 1 year ago
According to local testing, this PR will resolve https://github.com/JuliaAI/MLJText.jl/issues/25 .
@bkamins Be great if you could have a look at this, thanks.
Merging #189 (fb34924) into dev (6ab77cf) will increase coverage by
0.04%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## dev #189 +/- ##
==========================================
+ Coverage 93.50% 93.54% +0.04%
==========================================
Files 6 6
Lines 431 434 +3
==========================================
+ Hits 403 406 +3
Misses 28 28
Impacted Files | Coverage Δ | |
---|---|---|
src/ScientificTypes.jl | 100.00% <100.00%> (ø) |
|
src/convention/scitype.jl | 93.18% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Apart from a comment related to https://github.com/JuliaData/Tables.jl/issues/309 I have a general design question that I think would be worth discussing.
The issue is that Tables.istable
is only a one-way promise:
Check if an object has specifically defined that it is a table. Note that not all valid tables will return true, since it's possible to satisfy the Tables.jl interface at "run-time", e.g. a Generator of NamedTuples iterates NamedTuples, which satisfies the AbstractRow interface, but there's no static way of knowing that the generator is a table.
This means that is we see true
we know that something is a table, but when we see false
it does not mean it is not a table. This is indeed unfortunate.
The problem is that some MLJ.jl users can use a table for which Tables.istable
returns false
. Then they have a problem because your current mechanisms will not let them such a table be treated as a table, but rather it will get :other
treatment.
You know the MLJ.jl ecosystem much better than me 😄, so I do not feel qualified to give informed proposals what to do with this, but clearly the sentence:
any table type
T
supported by Tables.jl
is not true currently. It probably should be made more precise (if nothing is changed) to saying that table is only considered something for which Tables.istable
returns true
except these two excluded cases. The other solution could be that a separate method would be for something that user explicitly wants to be considered as table, and separate for things not to be considered tables (but I am not sure if this is feasible or convinient, it also might be breaking; so to do not treat this suggestion as something strong).
Given the discussion in https://github.com/JuliaData/Tables.jl/issues/309 and Tables.jl 1.10.0 release the PR should be updated to AbstractString
.
@bkamins Thanks for the review and comments. I have now updated the excluded key type from String
to AbstractString
.
It probably should be made more precise (if nothing is changed) to saying that table is only considered something for which Tables.istable returns true except these two excluded cases.
I have clarified the ScientificTypes.jl documentation to that effect, and raised an issue to do the same in MLJ docs.
As far as allowing tables in MLJ for which Tables.istable
is false
, I can't think of anything better than adding a wrapper for objects to declare them as bona fide tables, with Tables.jl methods passing through, which I guess is what you are suggesting. I suppose the main use case in ML for these tables would come from tabular data not fitting into memory. MLJ hasn't really been designed to train using out-of-memory datasets, so this probably fit's into the broader design discussion at, e.g., https://github.com/JuliaML/MLUtils.jl/issues/61 .
This PR will:
String
keysString
keysscitype
docstring (#185)