apache / arrow-julia

Official Julia implementation of Apache Arrow
https://arrow.apache.org/julia/
Other
283 stars 60 forks source link

Ensure that ArrowTypes.default is defined for Vararg tuples #466

Closed quinnj closed 1 year ago

quinnj commented 1 year ago

Fixes #461. This is the other proposal from what I originally suggested. Though Tuple{Varag} is never the concrete type of a value in Julia, it comes up when dealing with structs with fields that have Vararg types; not common at all, but it's allowed and happens.

The reflection code here is a little gross, but it seems to be what Base Julia uses in similar queries to see if a tuple has a vararg element. I've commented out the Arrow/runtests test for now until we bump another version and can then uncomment. Unit tests are added for ArrowTypes in the meantime.

codecov-commenter commented 1 year ago

Codecov Report

Merging #466 (b56063e) into main (5532270) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #466      +/-   ##
==========================================
+ Coverage   87.53%   87.57%   +0.04%     
==========================================
  Files          26       26              
  Lines        3272     3276       +4     
==========================================
+ Hits         2864     2869       +5     
+ Misses        408      407       -1     
Impacted Files Coverage Δ
src/ArrowTypes/test/tests.jl 100.00% <ø> (ø)
src/ArrowTypes/src/ArrowTypes.jl 87.70% <100.00%> (+0.85%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Moelf commented 1 year ago

just curious what does this serialize to in terms of Arrow types? I guess logically it's not different from a list since the length is not known (someone can serialize a column with different length of vargs in each row), thus not a fixed size list ni Arrow-land?

quinnj commented 1 year ago

As I noted above, Tuple{Vararg} actually isn't a concrete type of any kind of value in Julia, so it doesn't really make sense to talk about what it translates to in the arrow format. At runtime, you will always have some kind of concrete NTuple{N, T}, based on how many elements there actually are. Obviously, this could be pretty tricky to serialize to arrow, however, if you had, for example, a Vector{Tuple{Vararg{String}}}, with different sized tuples. We could potentially try to switch to treating those as lists instead of fixed-size lists, but it's such a weird/oddball type that it doesn't seem worth it unless someone has a real use-case for it.

Moelf commented 1 year ago

Tuple{Vararg} actually isn't a concrete type of any kind of value in Julia, so it doesn't really make sense to talk about what it translates to in the arrow format.

I don't see how this follows, surely we can talk about how much information/promise a type makes, and see what's the most precise Arrow type composition we can map to. In this case, if all you see is a Vararg{T}, it's making the same amount of guarantee (as far as serialization to/from bytes is concerned) as Vector{T} no?


We could potentially try to switch to treating those as lists instead of fixed-size lists

I guess right now we would error when people have a column of Vector{Vararg{T}} of varying length, that's fine, we can throw error and ask user to explicitly convert to Vector, somehow, which would be annoying if it's a field in a struct. For (a contrived) example:

julia> col1 = [v"2.0-rc1.x", v"2.0-rc1.x.y"]
2-element Vector{VersionNumber}:
 v"2.0.0-rc1.x"
 v"2.0.0-rc1.x.y"

Arrow types certainly have enough flexibility to represent these.