apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.31k stars 3.48k forks source link

[JS] Improve JS documentation on how to read/deserialize arrow data #37856

Open bluehat974 opened 12 months ago

bluehat974 commented 12 months ago

Describe the enhancement requested

cc @domoritz

Current JS documentation is not clear on how to read & manipulate the data from Apache Arrow JS

JS version of Apache Arrow is used in JS environment (DuckDB Wasm, ObservableHQ, Arquero) and people are asking on how to properly read the data, but there is no clear answer https://github.com/duckdb/duckdb-wasm/pull/1418

There is some documentation to read arrow data or deserialize to JSON https://duckdb.org/docs/api/wasm/query.html#arrow-table-to-json https://observablehq.com/@theneuralbit/using-apache-arrow-js-with-large-datasets

but this examples should be unified to the original Apache Arrow JS documentation https://github.com/apache/arrow/blob/main/js/README.md

Some ideas of code example to provide to the documentation:

Component(s)

Documentation, JavaScript

kevinschaich commented 4 months ago

100% agree on points mentioned above. I'm also curious if there is built-in Arrow functionality to handle casting to native Javascript types.

My workaround:

import { Table } from 'apache-arrow'
import { mapValues } from 'lodash'

export const arrowTableToRecords = (arrow: Table): Record<string, any>[] => {
    // this does not handle BigInts, can't override prototype because it refers to private symbol
    // const after = arrow.toArray().map((row) => row.toJSON())

    return arrow.toArray().map((obj: object) => {
        return mapValues(obj, (v: any) => {
            if (typeof v === 'bigint') {
                if (v < Number.MIN_SAFE_INTEGER || v > Number.MAX_SAFE_INTEGER) {
                    throw new TypeError(`${v} is not safe to convert to a number.`)
                }
                return Number(v)
            }
            return v
        })
    })
}

LMK if others have a better way to do this.

domoritz commented 4 months ago

I'm thinking about adding a way to tell arrow that you want data to be returned in more compatible types (e.g. arrays of numbers instead of bigints, numbers instead of decimal objects). It's not there yet but I think toArray is often not generating what people want.