Open lmouhib opened 3 weeks ago
Thats true, thanks for raising this. Spark Kubernetes pods respect Spark params but has no any idea about Comet params for now. Once pod memory allocated its not possible to change it. Since Comet has no control over Spark params, I'm inclining into including spark.comet.memory.overhead.factor
into spark memoryOverhead
but open to other propositions
I am inclined to including it to the spark overhead, however, I am not sure to understand why its not taken into consideration here. That line of code should provide the new value of the memory overhead, which is then passed to spark and the RM to create the pod template that gets applied to create the pod.
I think what might happen is the executor pod already started by the time Comet tries to tweak the memory and once the pod is up, its not possible to change the allocated memory size. What Comet does is correct, but I have some feeling it should have done earlier
If CometDriverPlugin
is loaded, it should overwrite executor overhead by existing executor overhead + Comet overhead.
We need to document the CometDriverPlugin
, its not something that is mentioned int he configurations or any page of the documentation.
I did set the spark.plugins
to org.apache.spark.CometPlugin
, and still did not observe any change in the executor container memory configuarion.
I'll run some tests on this today
How is the documentation for the configuration generated? The only mention there is of the plugin is in the code base itself. I can open a PR to add the conf, if its done in the documentation itself and not auto generated.
Depends on #689
Describe the bug
Currently when submitting a job on kubernetes, the total memory of the driver or executor is the sum of the memory defined in the spark configuration and the overhead (
spark.{executor|driver}.memoryOverhead
+spark.{executor|driver}.memory
). This does not included the default values defined byspark.comet.memory.overhead.factor
.Steps to reproduce
No response
Expected behavior
No response
Additional context
No response