apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
https://celeborn.apache.org/
Apache License 2.0
893 stars 361 forks source link

[CELEBORN-1727] Correct the calculation of worker diskInfo actualUsableSpace #2931

Closed onebox-li closed 3 days ago

onebox-li commented 4 days ago

What changes were proposed in this pull request?

Correct the calculation of worker diskInfo actualUsableSpace. Make the expression of the function to get the reserve size clearer. (getMinimumUsableSize -> getActualReserveSize). Let deviceMonitor startCheck after the first storageManager.updateDiskInfos() to avoid disks from being misidentified as HIGH_DISK_USAGE. Fix PushDataHandler#checkDiskFull judge.

Why are the changes needed?

Make sure worker disk reserve work correctly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Cluster test and UT.

turboFei commented 4 days ago

Seems need to fix many UT failures. @onebox-li

- celeborn spark integration test - hash-checkDiskFull *** FAILED ***
  235 was not less than or equal to 0 (CelebornHashCheckDiskSuite.scala:83)