Closed braaannigan closed 4 months ago
Did you checkpoint?
Did you checkpoint?
Haven't come across that before, how do I do it?
With the latest version it should automatically checkpount every 100 commits, but you can also manually do it by doing DeltaTable.create_checkpoint()
We're also seeing a similar issue where constructing the DeltaTable
takes 20-30 seconds, which is longer than we expected.
For context, we're using version 0.17.1. Our table is on AWS S3 and has 20000 transaction logs. We have a very recent checkpoint (only 5 versions from the latest). The parquet is 8 MB in size and has 20000 rows.
I traced through the performance with some custom debug logs. Here's the operations that takes the most time:
Edit: disregard these numbers, we were running on debug mode 😅
@ion-elgreco Shall I make a PR to document checkpointing a bit more?
@braaannigan improving our documentation is always welcome! I'm going to close this in the meantime
Environment
Delta-rs version: 0.17.4
Binding: python
Environment:
Bug
What happened: I have a DeltaTable on S3 partitioned by date with about 60 dates. The partitions have been compacted and vacuumed so have one file each. I append to this table 100 times a day so the transaction log has about 6000 json files.
When I try to create the DeltaTable object it takes 30 seconds.
What you expected to happen: I expected this operation to be faster but I'm not sure if that's a reasonable expectation?
How to reproduce it: No repro example, I'm just trying to establish if there is something unusual here
More details: