apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[HUDI-8484] Instant heartbeats memory leak #12212

Closed danny0405 closed 2 weeks ago

danny0405 commented 2 weeks ago

Change Logs

The heartbeat should be removed from memory when it has been explicitly stopped (in the post-commit step, which means the instant has been committed), otherwise the heartbeat is left in memory until the write client has been stopped entirely. In streaming pipeline, the write client is been reused for multiple instants.

The heartbeats are mainly designed for lazy cleaning and very probaly the clean table service is executed asynchronously on another workload, the in-memory heatbeat does not make any sense for a committed instant actually.

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

none

Contributor's checklist

hudi-bot commented 2 weeks ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build