apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.45k stars 3.52k forks source link

[JS] TypeError with utf8 and JSONVectorLoader.readData #22931

Closed asfimport closed 5 years ago

asfimport commented 5 years ago

Minimal repro:

 


const fields = [
  {
    name: 'first_name',
    type: {name: 'utf8'},
    nullable: false,
    children: [],
  },
];

Table.from({
  schema: {fields},
  batches: [{
    count: 1,
    columns: [{
      name: 'first_name',
      count: 1,
      VALIDITY: [],
      DATA: ['Fred']
    }]
  }]
});

 

Output:


/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:92
    readData(type, { offset } = this.nextBufferRange()) {
                     ^TypeError: Cannot destructure property `offset` of 'undefined' or 'null'.
    at JSONVectorLoader.readData (/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:92:38)
    at JSONVectorLoader.visitUtf8 (/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:46:188)
    at JSONVectorLoader.visit (/[snip]/node_modules/apache-arrow/visitor.js:28:48)
    at JSONVectorLoader.visit (/[snip]/node_modules/apache-arrow/visitor/vectorloader.js:40:22)
    at nodes.map (/[snip]/node_modules/apache-arrow/visitor.js:25:44)
    at Array.map (<anonymous>)
    at JSONVectorLoader.visitMany (/[snip]/node_modules/apache-arrow/visitor.js:25:22)
    at RecordBatchJSONReaderImpl._loadVectors (/[snip]/node_modules/apache-arrow/ipc/reader.js:523:107)
    at RecordBatchJSONReaderImpl._loadRecordBatch (/[snip]/node_modules/apache-arrow/ipc/reader.js:209:79)
    at RecordBatchJSONReaderImpl.next (/[snip]/node_modules/apache-arrow/ipc/reader.js:280:42)

 

 

Looks like the nextBufferRange call is returning undefined, due to an out-of-bounds buffersIndex.

 

Happy to provide more info if needed. Seems to only affect utf8 types and nothing else.

 

Environment: node v10.16.0, OSX 10.14.5 Reporter: Adam M Krebs / @akre54

Note: This issue was originally created as ARROW-6574. Please see the migration documentation for further details.

asfimport commented 5 years ago

Adam M Krebs / @akre54: Ah. Looks like I need to add an OFFSET array.

Is there documentation on this spec? Or a preferred way to turn JS objects into arrow?

asfimport commented 5 years ago

Paul Taylor / @trxcllnt: @akre54 This is the JSON IPC format which is only suitable for integration tests between the different Arrow implementations.

You can use the Vector Builders to encode to arbitrary JS objects into Arrow Vectors and Tables.

The raw Builder APIs allow you to control every aspect of the chunking and flushing behavior, but as a consequence are relatively low-level. There are higher-level APIs for transforming values from iterables, async iterables, node streams, or DOM streams. You can see examples of usage in the tests here, or see this example converting a CSV row stream to Arrow.

Lastly if your values are already in memory, you can call Vector.from() with an Arrow type and an iterable (or async-iterable) of JS values, and it'll use the Builders to return a Vector of the specified type:


// create from a list of numbers or a Float32Array (zero-copy) -- all values will be valid
const f32 = Float32Vector.from([1.1, 2.5, 3.7]);

// or a different style, handy if inferring the types at runtime
// values in the `nullValues` array will be treated as NULL, and written in the validity bitmap
const f32 = Vector.from({
  nullValues: [-1, NaN],
  type: new Arrow.Float32(),
  values: [1.1, -1, 2.5, 3.7, NaN],
});
// ^ result: [1.1, null, 2.5, 3.7, null]

// or with values from an AsyncIterator
const f32 = await Vector.from({
  type: new Arrow.Float32(),
  values: (async function*() { yield* [1.1, 2.5, 3.7]; }())
});