Fix the incorrect decrement of pendingWrites for FileWriter
Improve some logs about hardSplit/ExceptionLogs
Why are the changes needed?
There are multiple file writers that write data in handlePushMergeData. If the previous FileWriter has already been closed, the next decrementPendingWrites will use an incorrect FileWriter. And this will cause timeout when commitFiles.
java.io.IOException: Wait pending actions timeout, counter 1
at org.apache.celeborn.service.deploy.worker.storage.PartitionDataWriter.waitOnNoPending(PartitionDataWriter.java)
What changes were proposed in this pull request?
Why are the changes needed?
There are multiple file writers that write data in handlePushMergeData. If the previous FileWriter has already been closed, the next decrementPendingWrites will use an incorrect FileWriter. And this will cause timeout when commitFiles.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GA & manual test