Closed juliohm closed 3 years ago
Hi @juliohm.
As far as I was aware, columns
from Tables.jl is the function you would want for iterating table columns. The eachcol
funtion was added to Base
in order to iterate columns of Matrix
, similarly to eachrow
. In typed tables we model a table as an AbstractArray
so we must follow the AbstractArray
interfaces exported by Base
. Meanwhile Tables.jl exports rows
and columns
for these iterators. In fact, a typed table can be an AbstractMatrix
and you may wish to iterate those columns/rows via eachcol
and eachrow
.
It seems that DataFrames.jl makes a pun and uses eachcol
to get a DataFrameColumns
object - yet columns(df)
gives df
. We should open an issue with them.
Thank you @andyferris , makes a lot of sense.
OK let's see what they say - https://github.com/JuliaData/DataFrames.jl/issues/2636
By the way - if you want to make a DataFrame
from a TypedTables.Table
, doesn't DataFrame(table)
work?
Yes, it works. I was just reporting a behavior I thought was strange. I was
able to call Tables.matrix(t)
to get back a simple matrix inside a
calculation.
Em dom., 28 de fev. de 2021 às 18:11, Andy Ferris notifications@github.com escreveu:
Closed #68 https://github.com/JuliaData/TypedTables.jl/issues/68.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaData/TypedTables.jl/issues/68#event-4386542945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3KEGQTVMABPMG4QT6TTBKWRXANCNFSM4YLKLAAQ .
columns from Tables.jl is the function you would want for iterating table columns.
This isn't completely accurate. As of the v1.0 Tables.jl release, Tables.columns(x)
is required to return an object that implements Tables.columnnames(obj)
and Tables.getcolumn(obj, nm::Symbol)
and Tables.getcolumn(obj, i::Int)
, but not iteration. A convenience wrapper, Tables.Columns(x)
can be used to turn any result of Tables.column
into an object that supports iteration (over columns), getindex, and getproperty.
Tables.jl exports rows and columns
They technically aren't exports (not physically exported), though they are indeed critical functions of the public API.
yet columns(df) gives df
This is a valid implementation of the Tables.jl interface because, as mentioned above, it supports Tables.columnnames
and Tables.getcolumn
methods.
Given Tables.columnnames
and Tables.getcolumn
isn't it trivial to define an iterator, though? What's the advantage of not requiring particular behavior from iterate
?
Or more to the point - does there exist a function in Tables.jl exist the helps you iterate
over columns of a table?
does there exist a function in Tables.jl exist the helps you iterate over columns of a table
Yes, as mentioned, if you do cols = Tables.Columns(Tables.columns(input))
, the Columns
object you get back will iterate columns. It also supports operations like cols.col1
, or cols[:col1]
or cols[1]
, making it "dataframe-like" in a way. There have been a few discussions among JuliaData devs around simplifying so that Tables.Columns
automatically calls Tables.columns
for your underneath, since it seems quite redundant to users who aren't familiar with the distinction.
Ok, I see, thanks Jacob. Have you considered having a generic function that returns an object with this stated behaviour, rather than a constructor for a particular type? Like the difference between view
vs SubArray
? It’s more interface based and generic that way.
E.g. by default return to users a Columns
from columns
, and have package authors implement either some simpler function (e.g. _columns
) or the full columns
as they prefer?
Not really, but maybe it's worth considering. I kind of like keeping the distinction that Tables.columns
returns only an object that satisfies the interface, while Tables.Columns
would return a specific object that not only satisfied the interface, but was dispatchable and had additional convenient behavior.
This is affecting downstream code where I used to use DataFrames.jl.