apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.64k stars 3.56k forks source link

[JS] Include Node.js bindings to Apache Parquet C++ library #40963

Open mhkeller opened 7 months ago

mhkeller commented 7 months ago

Describe the enhancement requested

PyArrow's documentation says how it handles reading and writing of parquet files

Apache Arrow is an ideal in-memory transport layer for data that is being read or written with Parquet files. We have been concurrently developing the C++ implementation of Apache Parquet, which includes a native, multithreaded C++ adapter to and from in-memory Arrow data. PyArrow includes Python bindings to this code, which thus enables reading and writing Parquet files with pandas as well.

It would be useful if the Node.js library apache-arrow also had bindings to this C++ code. It would help in development of desktop apps written in Electron and generally help the development of data tools that exist in the Node.js ecosystem.

Component(s)

JavaScript

simline commented 6 months ago

If any plan for WASM with browsers would be better.

kou commented 6 months ago

We have Emscripten support. (WASI SDK isn't supported yet.)