delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 364 forks source link

Javascript (server side) bindings via n-api #2342

Open universalmind303 opened 3 months ago

universalmind303 commented 3 months ago

Description

I'd like a similar sdk as the python sdk but available in server side js (node, deno, bun, ...). The sdk should closely resemble the python one, and only deviate when either necessary due to napi limitations, or when it is unidiomatic in JS.

napi.rs is very similar in nature to pyo3 and is relatively easy to use.

Use Case I want to read delta files via a js application.

Related Issue(s)

https://github.com/pola-rs/nodejs-polars/issues/176

universalmind303 commented 3 months ago

@rtyler I'd be happy to contribute to this one. Feel free to assign it to me.

ion-elgreco commented 3 months ago

@universalmind303 I tried creating a reader with polars rust directly but the Scan is not flexible enough to read schema evolved table unfortunately : (

So curious to see how you are planning to do this with polars 😄

universalmind303 commented 3 months ago

@universalmind303 I tried creating a reader with polars rust directly but the Scan is not flexible enough to read schema evolved table unfortunately : (

So curious to see how you are planning to do this with polars 😄

for polars, there is a way to push down some operations to the datasource during planning. AnonymousScan. It doesn't support all of the pushdowns, but it can likely be improved to support all of the same pushdowns as parquet. If it's all wired up correctly, the planner should handle all of the schema evolutions for you.

ion-elgreco commented 3 months ago

@universalmind303 I tried using a LogicalPlan::FileScanfor parquet but polars will error out during collection, since it concats the dfs instead of doing a diagonal concat. Also if you provide a schema to the reader and a column is not in the parquet, it will error out instead of creating a null array

Here you can see old branch where I was playing around with the scan: https://github.com/ion-elgreco/polars-deltalake/blob/feat/cloud_reads/python/src/lib.rs

TheKnightCoder commented 2 months ago

What is the update on this? Would also love this to use read/write delta with nodejs , huge unlock with the amount of js devs imo.

universalmind303 commented 2 months ago

What is the update on this? Would also love this to use read/write delta with nodejs , huge unlock with the amount of js devs imo.

I had an initial POC for it here, but since then, the python API has changed quite a bit. I made a bit of progress getting it to feature parity with the python one, but not quite there yet.

I'm hoping to find some time to work on this again soon. If anyone else wants to work on it in the meantime, by all means!