Closed bkamins closed 3 years ago
Merging #40 (0e62aad) into main (708b8ba) will increase coverage by
1.28%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #40 +/- ##
==========================================
+ Coverage 91.30% 92.59% +1.28%
==========================================
Files 1 1
Lines 23 27 +4
==========================================
+ Hits 21 25 +4
Misses 2 2
Impacted Files | Coverage Δ | |
---|---|---|
src/DataAPI.jl | 92.59% <100.00%> (+1.28%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 708b8ba...0e62aad. Read the comment docs.
Makes sense. Maybe we should also say that Tables.jl is the owner of the function, and add a fallback definition there (that would notably work for vectors of names tuples)? I'm also not sure whether we should provide a docstring here, or in Tables.jl instead, given that we don't define any method here (cf. discussion regarding join functions).
@quinnj - is making Tables.jl an owner feasible? I was thinking about it also, but I was not sure.
An alternative would be to make the default implementation in DataAPI.jl that would return missing
(signaling that it is unknown)?
Then clearly Tables.jl could add extra methods in cases when nrow
or ncol
is known.
I'd rather throw a MethodError
than return missing
for unsupported types. The latter could be useful for table objects for which the number of rows cannot be known in advance (e.g. iterators of named tuples).
OK - makes sense. With the definitions I gave now MethodError
is thrown appropriately.
So I assume that if we merge and release it Tables.jl should add nrow
and ncol
definitions to the tables it ships. Right?
Ah - and I made the docsting, as as opposed to join*
case I think we have a fixed API here (as opposed to join*
where different packages can probably have different args/kwargs)
Just FYI: For a Tables.jl implementation one might look at what we have currently in MLJBase.jl (courtesy of @OkonSamuel): https://github.com/JuliaAI/MLJBase.jl/blob/f21d295d3f0714ec8f5dcc93d1aa34ccbf16fdff/src/interface/data_utils.jl#L77 .
I'm fine with adding these definitions here; I've mentioned elsewhere in various Tables.jl issues, however, that nrow
and ncol
aren't well-defined for all "tables" (in the Tables.jl source definition). You can have a row-iterator table with Base.SizeUnknown()
or a filtering generator of NamedTuples and you won't be able to call nrow
on those.
Now, I will say that's just the row-based table case. For column-based tables, we do expect the # of rows to be defined, and Tables.jl has Tables.rowcount
internally defined that works on any result of Tables.columns(x)
.
And so, I'd say I don't think I'm on board with having Tables.jl be the owner, because I'm afraid that will give the impression that nrow
is now supported on any table. It makes it tricky though because to rename Tables.rowcount
to nrow
, it kind of needs to be the fallback definition. It's a little messy.
So - in summary. What I propose now is just what is needed if I understand the comments correctly. Let us wait for @nalimilan to review.
@nalimilan - could you please review. After merging this I would make a 1.9 release. Thank you!
Thank you! Making a release
cc @OkonSamuel
Would not nrows
and ncols
be better names? It would be more consistent with Base (e.g. ndims
, nfields
, etc). (see also https://github.com/JuliaData/DataFrames.jl/issues/2247)
I think it is too late to change this for DataFrames.jl, and given we will stick to nrow
and ncol
there it is better to use a common naming scheme across packages.
Following the discussion in https://github.com/JuliaAI/ScientificTypes.jl/issues/137#issuecomment-885258513.
CC @quinnj @nalimilan @ablaom