delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.2k stars 395 forks source link

Creating DeltaTable object slow #2518

Closed braaannigan closed 4 months ago

braaannigan commented 4 months ago

Environment

Delta-rs version: 0.17.4

Binding: python

Environment:


Bug

What happened: I have a DeltaTable on S3 partitioned by date with about 60 dates. The partitions have been compacted and vacuumed so have one file each. I append to this table 100 times a day so the transaction log has about 6000 json files.

When I try to create the DeltaTable object it takes 30 seconds.

What you expected to happen: I expected this operation to be faster but I'm not sure if that's a reasonable expectation?

How to reproduce it: No repro example, I'm just trying to establish if there is something unusual here

More details:

ion-elgreco commented 4 months ago

Did you checkpoint?

braaannigan commented 4 months ago

Did you checkpoint?

Haven't come across that before, how do I do it?

ion-elgreco commented 4 months ago

With the latest version it should automatically checkpount every 100 commits, but you can also manually do it by doing DeltaTable.create_checkpoint()

PeterKeDer commented 4 months ago

We're also seeing a similar issue where constructing the DeltaTable takes 20-30 seconds, which is longer than we expected.

For context, we're using version 0.17.1. Our table is on AWS S3 and has 20000 transaction logs. We have a very recent checkpoint (only 5 versions from the latest). The parquet is 8 MB in size and has 20000 rows.

I traced through the performance with some custom debug logs. Here's the operations that takes the most time:

Edit: disregard these numbers, we were running on debug mode 😅

braaannigan commented 4 months ago

@ion-elgreco Shall I make a PR to document checkpointing a bit more?

rtyler commented 4 months ago

@braaannigan improving our documentation is always welcome! I'm going to close this in the meantime