Open NHDaly opened 2 years ago
Hmmm, yeah, this shouldn't be too bad to support. I think the easiest approach would be to hook into the Tables.jl interface for this. We could create a pseudo-table type like:
struct ArrayOfArraysTable{T}
source::T
end
Tables.columns(x::ArrayOfArraysTable) = x
Tables.getcolumn(x::ArrayOfArraysTable, i::Int) = x.source[i]
So that should mostly work on the tables side of things in terms of the data. For the schema message writing, we'll get Tables.schema(x::ArrayOfArraysTable) = nothing
as the fallback, so I think then we just need another overload for makeschema(b, sch::nothing, columns)
, where we create the schema message but with no column names.
We have a data source (Relations from our database engine at RelationalAI) that have columnar data, but without column names. (We represent a Relation as a Set of Tuples, e.g.
movie_title
relates movie IDs to Titles, so the positions are meaningful but they do not have names.)We would like to encode this in Arrow as essentially a Vector of columns. In JSON, we would encode this as:
From what I can tell, this is supported by the Arrow spec, but isn't currently supported by the Arrow.jl package?
This is the understanding my colleague and I have come to of the current situation:
Symbol("1")
(which is a bit cumbersome to work with in Julia).Can we work to expose this ability through the Arrow.jl package as well, in the code to construct an Arrow stream from a column-wise data source?
Thanks!
CC: @bachdavi