apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
https://celeborn.apache.org/
Apache License 2.0
863 stars 351 forks source link

[CELEBORN-1544][FOLLOWUP] ShuffleWriter needs to catch exception and call abort to avoid memory leaks #2663

Closed cxzl25 closed 3 weeks ago

cxzl25 commented 1 month ago

What changes were proposed in this pull request?

This PR aims to fix a possible memory leak in ShuffleWriter.

Introduce a private abort method, which can be called to release memory when an exception occurs.

Why are the changes needed?

https://github.com/apache/celeborn/pull/2661 Call the close method in the finally block, but the close method has shuffleClient.mapperEnd, which is dangerous for incomplete tasks, and the data may be inaccurate.

Does this PR introduce any user-facing change?

No

How was this patch tested?

GA

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 33.31%. Comparing base (ea6617c) to head (3dd119b). Report is 28 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2663 +/- ## ========================================== - Coverage 39.83% 33.31% -6.51% ========================================== Files 239 310 +71 Lines 15026 18227 +3201 Branches 1362 1675 +313 ========================================== + Hits 5984 6071 +87 - Misses 8711 11816 +3105 - Partials 331 340 +9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.