apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.48k stars 741 forks source link

IPC Reader panics on data created by Arrow JavaScript #6415

Open kylebarron opened 6 days ago

kylebarron commented 6 days ago

Describe the bug

The line here: https://github.com/apache/arrow-rs/blob/5414f1d7c0683c64d69cf721a83c17d677c78a71/arrow-ipc/src/convert.rs#L98

panics

panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-ipc-53.0.0/src/convert.rs:98:30:
called `Option::unwrap()` on a `None` value

on list array data created by Arrow JavaScript, which does not have a name field set. The name is not required to be set.

To Reproduce

This JavaScript code creates this IPC buffer (output data here: data.arrows.zip)

import { tableFromArrays, tableToIPC } from "apache-arrow"
import { Table } from "parquet-wasm"

const table = tableFromArrays({
  column: [[1, 2], [3, 4]],
})
const ipc = tableToIPC(table, "stream")
// This loads the IPC buffer using arrow-rs
Table.fromIPCStream(ipc)

Loading this with pyarrow, we see that the inner list field has no name set (which pyarrow infers as an empty string)

In [1]: import pyarrow as pa

In [3]: pa.ipc.open_stream("data.arrows").read_all()
Out[3]:
pyarrow.Table
column: list<: double>
  child 0, : double
----
column: [[[1,2],[3,4]]]

Expected behavior

IPC Reader should not panic.

Additional context

Originally reported in https://github.com/kylebarron/parquet-wasm/issues/606

alamb commented 6 days ago

Makes sense to me that the reader should not panic