Closed leoluan2009 closed 1 week ago
@zhztheplayer @zhouyuan can you give some thoughts? thanks!
Driver and executor do have different plugin entrypoints, https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxListenerApi.scala, are you suggesting a new approach?
Driver and executor do have different plugin entrypoints, https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxListenerApi.scala, are you suggesting a new approach?
But in VeloxBackend.cc, we can not know where it run. The info do not pass from java code to cpp code
Driver and executor do have different plugin entrypoints, https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxListenerApi.scala, are you suggesting a new approach?
But in VeloxBackend.cc, we can not know where it run. The info do not pass from java code to cpp code
I see. Do you know which part of C++ code requires for this information?
Driver and executor do have different plugin entrypoints, https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxListenerApi.scala, are you suggesting a new approach?
But in VeloxBackend.cc, we can not know where it run. The info do not pass from java code to cpp code
I see. Do you know which part of C++ code requires for this information?
if it run in driver, it should not init velox cache. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/compute/VeloxBackend.cc#L197
I am curious why it matters to initialize the cache in driver or not. Do you already see some issues or errors in your circumstance?
BTW I'll prefer changing the JNI API to have different paths for driver / executor native initializations if we have to do it.
I am curious why it matters to initialize the cache in driver or not. Do you already see some issues or errors in your circumstance?
BTW I'll prefer changing the JNI API to have different patches for driver / executor native initializations if we have to do it.
Yes, when initialize the cache, it will create cache dir and check remaining disk capacity while spark driver node may has smaller disk than executor.
Do we start any Velox pipeline on driver today? Where the cache is initialized?
Looks only the BHJ's hash build may be run on driver which we haven't implemented yet.
Do we start any Velox pipeline on driver today? Where the cache is initialized?
Looks only the BHJ's hash build may be run on driver which we haven't implemented yet.
this line will check ssd space. https://github.com/apache/incubator-gluten/blob/c653337cdf54067cd4a01d14b908a521fdd11b3a/cpp/velox/compute/VeloxBackend.cc#L217
Thank you. Then we should initialize velox on driver and worker differently.
Description
VeloxBackend show know where it run, executor or driver? for example if if run driver ,it should not init velox cache. There are two methods to this enhancement: