Open rob-harrison opened 6 days ago
We would expect only a small increase in working memory during checkpoint creation (as it should only need the last checkpoint + transaction logs - right?).
yes checkpoint + transaction logs + unexpired deleted files
are your transaction logs very big in size ?. what's your last checkpoint size ?
are your transaction logs very big in size ?. what's your last checkpoint size ?
transaction logs are ±15k each we do however have a lot of them if that's significant? (±400k - see checkpoint below)
2024-07-01 10:26:56 15507 00000000000000396955.json
2024-07-01 10:31:15 15409 00000000000000396956.json
2024-07-01 10:35:31 17004 00000000000000396957.json
2024-07-01 10:39:26 13652 00000000000000396958.json
2024-07-01 10:43:26 15457 00000000000000396959.json
2024-07-01 10:47:13 15504 00000000000000396960.json
last checkpoint size is ±155Mi
{"size":336792,"size_in_bytes":162615157,"version":396599}
hmm super weird .
maybe @rtyler can you help you more .
ohno.jpeg
:smile: This might be easier to pair with on Slack if you're open @rob-harrison since I have some questions which I doubt you'd want to answer in a public forum about the contents of some of those transaction logs. If there are a lot of actions that are being aggregated together and the checkpoint iteration is few and far between, I can imagine that requiring some hefty memory to compute since we have to load everything into memory.
Additionally if the table hasn't been optimized in a while, that can also increase memory with lots of small transactions.
Environment
Delta-rs version: deltalake==0.18.1
Binding: python (rust engine)
Environment:
Cloud provider: AWS OS: Mac 10.15.7 and Debian GNU/Linux 12 (bookworm) Other:
Bug
What happened: We have a kafka consumer appending batches of records (±6k) to delta using write_deltalake. Normal working memory is ±600Mi however when delta attempts to write its checkpoint every 100 batches, working memory spikes to ±7Gi (see grafana)
What you expected to happen: We would expect only a small increase in working memory during checkpoint creation (as it should only need the last checkpoint + transaction logs - right?). Would appreciate please any insight into why this might be happening and any options to alleviate it.
How to reproduce it:
More details: