NVIDIA / sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
https://www.sparkucx.org/
BSD 3-Clause "New" or "Revised" License
22 stars 14 forks source link

add synchronization for transport initialization #30

Closed jeynmann closed 1 year ago

jeynmann commented 1 year ago

Add latch for synchronization of transport initialize, in case that the spark tasks try to get writer/reader before the transport initialization finished.

gleon99 commented 1 year ago

Do we need to port it to the DPU branch as well?

jeynmann commented 1 year ago

Do we need to port it to the DPU branch as well?

Yes, IMO. This is easy to occur if the transport initial time is longer than stage0 or the executor resources are not sufficient at the beginning of the spark application.