elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.04k stars 112 forks source link

Add delta lake file support #752

Open the-destro opened 6 months ago

the-destro commented 6 months ago

Python Polars has support for delta lake files already though I can't find the function exposed in the rust package.

josevalim commented 6 months ago

Can you please link to the Python version of the function?

watsy0007 commented 6 months ago

@the-destro https://pola-rs.github.io/polars/py-polars/html/reference/io.html#delta-lake Is scan_delta, read_delta and DataFrame.write_delta() these 3 functions ?

https://github.com/pola-rs/polars/blob/40d3e0818408d836abf6c31146a3f69fd628f0fb/py-polars/polars/io/delta.py#L295

Make sure to install deltalake>=0.8.0. Read the documentation here <https://delta-io.github.io/delta-rs/python/installation.html>_.

The rust package repository is https://github.com/delta-io/delta-rs

josevalim commented 2 months ago

Btw, isn't delta-lake storage pretty much Parquet files? Could you access them directly instead? Writing them would be a bit more complicated though.

watsy0007 commented 2 months ago

Btw, isn't delta-lake storage pretty much Parquet files? Could you access them directly instead? Writing them would be a bit more complicated though.

Yes, that's correct. In our company, we integrate DuckDB, dbt, and Delta Lake with Python for business operations. I'm currently considering replacing some of these components with Elixir.