apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.35k stars 3.49k forks source link

[JS] Include Node.js bindings to Apache Parquet C++ library #40963

Open mhkeller opened 6 months ago

mhkeller commented 6 months ago

Describe the enhancement requested

PyArrow's documentation says how it handles reading and writing of parquet files

Apache Arrow is an ideal in-memory transport layer for data that is being read or written with Parquet files. We have been concurrently developing the C++ implementation of Apache Parquet, which includes a native, multithreaded C++ adapter to and from in-memory Arrow data. PyArrow includes Python bindings to this code, which thus enables reading and writing Parquet files with pandas as well.

It would be useful if the Node.js library apache-arrow also had bindings to this C++ code. It would help in development of desktop apps written in Electron and generally help the development of data tools that exist in the Node.js ecosystem.

Component(s)

JavaScript

simline commented 5 months ago

If any plan for WASM with browsers would be better.

kou commented 5 months ago

We have Emscripten support. (WASI SDK isn't supported yet.)