Closed xbrianh closed 3 months ago
@xbrianh it's mainly slow because .to_pydict moves all values of the RecordBatch in a python dict. Due to wide amount of cols, you also get a lot of unnecessary stuff about empty stats.
I've pushed a PR to fix this
Environment
Delta-rs version:
Binding: Python
Bug
What happened: Slow
add_actions.to_pydict()
for large numbers of columns.What you expected to happen: same info faster
How to reproduce it:
On some azure instances I see ~27 seconds. On my M2 mac performance is better at ~9 seconds, but this still seems slow.
More details: This seems unusually slow, and also impacts deltalake read operations here.