You'd expect that the resulting memoryOverhead in the SparkApplication would be 1g, however, on inspecting the SparkApplication, I noticed it was set to only 128m.
memOverheadBytes = as_bytes(executor_memory_overhead)
jvmOverheadBytes = as_bytes("128m")
# By default, Python uses the space reserved by `spark.executor.memoryOverhead` but no limit is enforced.
# When `spark.executor.pyspark.memory` is specified, Python will only use this memory and no more.
python_max = job_options.get("python-memory", None)
if python_max is not None:
python_max = as_bytes(python_max)
if "executor-memoryOverhead" not in job_options:
memOverheadBytes = jvmOverheadBytes
executor_memory_overhead = f"{memOverheadBytes//(1024**2)}m"
else:
# If python-memory is not set, we convert most of the overhead memory to python memory
python_max = memOverheadBytes - jvmOverheadBytes
executor_memory_overhead = f"{jvmOverheadBytes//(1024**2)}m"
When python-memory is not set, it seems that the executor_memory_overhead is always set to jvmOverheadBytes and the executor-memoryOverhead is ignored.
When trying to implement the Yunikorn scheduler, I noticed that the annotations set for the job were not correct.
I tried to run following batch job:
You'd expect that the resulting
memoryOverhead
in theSparkApplication
would be1g
, however, on inspecting the SparkApplication, I noticed it was set to only128m
.Culprit seems to be here: https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/backend.py#L1821-L1835
When
python-memory
is not set, it seems that theexecutor_memory_overhead
is always set tojvmOverheadBytes
and theexecutor-memoryOverhead
is ignored.