delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

Large Memory Spike At Checkpoint #2628

Open rob-harrison opened 6 days ago

rob-harrison commented 6 days ago

Environment

Delta-rs version: deltalake==0.18.1

Binding: python (rust engine)

Environment:

Cloud provider: AWS OS: Mac 10.15.7 and Debian GNU/Linux 12 (bookworm) Other:


Bug

What happened: We have a kafka consumer appending batches of records (±6k) to delta using write_deltalake. Normal working memory is ±600Mi however when delta attempts to write its checkpoint every 100 batches, working memory spikes to ±7Gi (see grafana)

What you expected to happen: We would expect only a small increase in working memory during checkpoint creation (as it should only need the last checkpoint + transaction logs - right?). Would appreciate please any insight into why this might be happening and any options to alleviate it.

How to reproduce it:

from deltalake import DeltaTable, write_deltalake

    dt = DeltaTable(table_path, storage_options=storage_options)
    ...
    def append_heartbeats(self, events: List[Event]):
        # append heartbeats to datastore
        batch = build_pyarrow_batch(events=events)
        if batch.num_rows > 0:
            write_deltalake(
                dt,
                batch,
                partition_by="date",
                mode="append",
                engine="rust",
                large_dtypes=False,
            )

More details:

Screen Shot 2024-06-27 at 22 49 18
sherlockbeard commented 3 days ago

We would expect only a small increase in working memory during checkpoint creation (as it should only need the last checkpoint + transaction logs - right?).

yes checkpoint + transaction logs + unexpired deleted files

are your transaction logs very big in size ?. what's your last checkpoint size ?

rob-harrison commented 2 days ago

are your transaction logs very big in size ?. what's your last checkpoint size ?

transaction logs are ±15k each we do however have a lot of them if that's significant? (±400k - see checkpoint below)

2024-07-01 10:26:56      15507 00000000000000396955.json
2024-07-01 10:31:15      15409 00000000000000396956.json
2024-07-01 10:35:31      17004 00000000000000396957.json
2024-07-01 10:39:26      13652 00000000000000396958.json
2024-07-01 10:43:26      15457 00000000000000396959.json
2024-07-01 10:47:13      15504 00000000000000396960.json

last checkpoint size is ±155Mi

{"size":336792,"size_in_bytes":162615157,"version":396599}
sherlockbeard commented 2 days ago

hmm super weird .

maybe @rtyler can you help you more .

rtyler commented 1 day ago

ohno.jpeg

:smile: This might be easier to pair with on Slack if you're open @rob-harrison since I have some questions which I doubt you'd want to answer in a public forum about the contents of some of those transaction logs. If there are a lot of actions that are being aggregated together and the checkpoint iteration is few and far between, I can imagine that requiring some hefty memory to compute since we have to load everything into memory.

Additionally if the table hasn't been optimized in a while, that can also increase memory with lots of small transactions.