elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

Add delta lake file support #752

Open the-destro opened 11 months ago

the-destro commented 11 months ago

Python Polars has support for delta lake files already though I can't find the function exposed in the rust package.

josevalim commented 11 months ago

Can you please link to the Python version of the function?

watsy0007 commented 11 months ago

@the-destro https://pola-rs.github.io/polars/py-polars/html/reference/io.html#delta-lake Is scan_delta, read_delta and DataFrame.write_delta() these 3 functions ?

https://github.com/pola-rs/polars/blob/40d3e0818408d836abf6c31146a3f69fd628f0fb/py-polars/polars/io/delta.py#L295

Make sure to install deltalake>=0.8.0. Read the documentation here <https://delta-io.github.io/delta-rs/python/installation.html>_.

The rust package repository is https://github.com/delta-io/delta-rs

josevalim commented 7 months ago

Btw, isn't delta-lake storage pretty much Parquet files? Could you access them directly instead? Writing them would be a bit more complicated though.

watsy0007 commented 7 months ago

Btw, isn't delta-lake storage pretty much Parquet files? Could you access them directly instead? Writing them would be a bit more complicated though.

Yes, that's correct. In our company, we integrate DuckDB, dbt, and Delta Lake with Python for business operations. I'm currently considering replacing some of these components with Elixir.