delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.2k stars 395 forks source link

Memory leak on writes/merges #2522

Open echai58 opened 4 months ago

echai58 commented 4 months ago

Environment

Binding: python


Bug

What happened: We're noticing constantly rising memory in our processes that write to deltalake. I wrote a minimal reproduction that loops and writes to deltalake, and the memory usage seems to indicate a memory leak.

What you expected to happen: Memory to be reclaimed after writes.

How to reproduce it: This is my script I tested with:

import datetime
from deltalake import write_deltalake, DeltaTable
import numpy as np
import pandas as pd 
import pyarrow as pa

schema = pa.schema(
    [
        ("date", pa.date32()),
        ("c1", pa.string()),
        ("c2", pa.int32()),
        ("v1", pa.float64()),
        ("v2", pa.float64()),
        ("v3", pa.int32()),
    ]
)

def write_table(dt: DeltaTable, date: datetime.date):
    num_rows = 150
    data = pd.DataFrame(
        {
            "date": np.repeat(date, num_rows),
            "c1": [f"str_{i}" for i in np.random.randint(0, 100, num_rows)],
            "c2": np.random.randint(1, 100, num_rows),
            "v1": np.random.rand(num_rows) * 100,
            "v2": np.random.rand(num_rows) * 100,
            "v3": np.random.randint(0, 100, num_rows),
        }
    )
    dt.merge(
        pa.Table.from_pandas(data, schema=schema),
        "s.date = t.date and s.c1 = t.c1 and s.c2 = t.c2",
        source_alias="s",
        target_alias="t",
    ).when_matched_update_all().when_not_matched_insert_all().execute()
    if dt.version() % 10 == 0:
        dt.create_checkpoint()

def main():
    DeltaTable.create("test", schema)

    for date in pd.date_range("2022-01-01", periods=25):
        for _ in range(50):
            write_table(DeltaTable("test"), date.date())

if __name__ == "__main__":
    main()

More details: Here's the memray graph: image

I also tested this with just write_deltalake(mode="append"), and the issue seems to also persist: image

I saw https://github.com/delta-io/delta-rs/issues/2068 and tried setting that env var, and got the following (doesn't seem to help): image

ion-elgreco commented 4 months ago

Can you do the following, change your script to sleep first for 30 secs, then create the table and then sleep again for 30 secs, then start writing in a loop those 50 times?

I think the slow increase in resident size might just be because at every write you update the table state at the end of the commit since it includes more info now.

echai58 commented 4 months ago

here's the mem graph with the 30 second sleeps:

image

echai58 commented 4 months ago

@ion-elgreco

This is a graph for 250 merges: image

and for 250 appends: image

The slow increase in the 250 appends I think would correspond to the increased table state metadata. But the merge memory use rises much faster, so maybe it is something particular with merges?

ion-elgreco commented 4 months ago

Merge operations probably holds more info, so this looks normal to me

echai58 commented 4 months ago

@ion-elgreco The _last_checkpoint file says: {"size":360,"size_in_bytes":75770,"version":100}

for the last checkpoint after running a script with 1000 merges resulting in the following memory increase: image

The amount the memory increased seems much larger than the metadata

ion-elgreco commented 4 months ago

@echai58 the checkpoint is compressed and also would never translate 1:1 from disk to mem afaik

echai58 commented 4 months ago

@ion-elgreco profiling a script that just instantiates the same delta table gives the following: image

~13 mb , which is still much less than the >100mb seen from the merge script

malachibazar commented 2 months ago

Has there been any progress on this? I'm experiencing the same issue with merges.

rtyler commented 1 month ago

I have a theory that this might be related to some of the performance issues that @roeap and I were hunting after this week. He's got some fixes in mind after which we can look into this specific issue some more