apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.63k stars 3.56k forks source link

[Ruby] Unexpected behavior when building rows with an empty list #44742

Closed fpacanowski closed 3 days ago

fpacanowski commented 6 days ago

Describe the bug, including details regarding any error messages, version, and platform.

The issue occurs when there's a list of structs defined in schema. Here's a minimal example:

require 'arrow'
require 'parquet'

schema = Arrow::Schema.new(
  [
   Arrow::Field.new("structs", Arrow::ListDataType.new(
     Arrow::StructDataType.new([
       Arrow::Field.new("foo", :int64),
       Arrow::Field.new("bar", :int64)
     ])
   ))
 ]
)

# This works.
table = Arrow::RecordBatchBuilder.build(schema, [
  { structs: [{foo: 1, bar: 2}, {foo: 3, bar: 4}] },
  { structs: [{foo: 5, bar: 6}] }
]).to_table
table.save('file.parquet')

# This errors out.
table = Arrow::RecordBatchBuilder.build(schema, [
  { structs: [] },
  { structs: [] },
]).to_table
table.save('file.parquet')

I expected the second invocation to produce a table with two rows with empty lists in structs column. Instead I got the following error:

/home/filip/.rvm/gems/ruby-3.3.0/gems/gobject-introspection-4.2.4/lib/gobject-introspection/loader.rb:715:in `invoke': [parquet][arrow][file-writer][write-table]: Invalid: Column 0: In chunk 0: Invalid: List child array invalid: Invalid: Struct child array #0 has length smaller than expected for struct array (0 < 2) (Arrow::Error::Invalid)
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/gobject-introspection-4.2.4/lib/gobject-introspection/loader.rb:715:in `invoke'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/gobject-introspection-4.2.4/lib/gobject-introspection/loader.rb:583:in `write_table'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-parquet-17.0.0/lib/parquet/arrow-table-savable.rb:41:in `block (2 levels) in save_as_parquet'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-arrow-17.0.0/lib/arrow/block-closable.rb:25:in `open'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-parquet-17.0.0/lib/parquet/arrow-table-savable.rb:38:in `block in save_as_parquet'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-arrow-17.0.0/lib/arrow/block-closable.rb:25:in `open'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-arrow-17.0.0/lib/arrow/table-saver.rb:115:in `open_raw_output_stream'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-parquet-17.0.0/lib/parquet/arrow-table-savable.rb:37:in `save_as_parquet'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-arrow-17.0.0/lib/arrow/table-saver.rb:77:in `save_to_file'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-arrow-17.0.0/lib/arrow/table-saver.rb:53:in `save'
    from /home/filip/.rvm/gems/ruby-3.3.0/gems/red-arrow-17.0.0/lib/arrow/table.rb:447:in `save'
    from repro.rb:27:in `<main>'

This is running on version 17.0.0 of red-arrow and red-parquet.

Component(s)

Ruby

kou commented 3 days ago

Issue resolved by pull request 44763 https://github.com/apache/arrow/pull/44763