delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.32k stars 407 forks source link

performance regression in 0.10.2 #1632

Closed djouallah closed 11 months ago

djouallah commented 1 year ago

Environment

0.10.2

Python

Environment: Cloudflare R2


Bug

sorry the bug report is not very helpful, but I notice when I updated to the latest 10.2, the performance has degraded considerably, I am using Google Cloud functions to write to Cloudflare R2, I had to go back to 10.1

image

djouallah commented 1 year ago

same with writing to Azure, using Fabric notebook, the resource usage just increased significantly ? image

rtyler commented 1 year ago

Do you happen to have an example of some write code you can share? There are a few reasons I can imagine this changing.

What would also be helpful would be the pyarrow version being used (and/or pandas version)

djouallah commented 1 year ago

here is the code, I am not sure how to get the pandas version, I just run this code in google cloud functions

https://github.com/djouallah/aemo_tracker/blob/main/writedelta.py

ion-elgreco commented 11 months ago

@djouallah can you check against v0.13 if you still see performance issues?