Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

Executor memoryOverhead is not respected on Kube batch job when python-memory is not set #870

Closed tcassaert closed 1 month ago

tcassaert commented 2 months ago

When trying to implement the Yunikorn scheduler, I noticed that the annotations set for the job were not correct.

I tried to run following batch job:

import openeo
import json

with open("pg.json") as pg:
    graph = json.load(pg)

connection = openeo.connect("http://localhost:50001").authenticate_oidc()
cube = connection.datacube_from_flat_graph(graph)

job_options = {
    "driver-memory": "3G",
    "driver-cores": "1",
    "executor-memory": "1500m",
    "executor-memoryOverhead": "1g",
    "executor-cores": "1",
    "executor-request-cores": "500m",
    "max-executors": "30",
    "executor-threads-jvm": "7",
    "logging-threshold": "info"
}
cube.execute_batch(out_format="netCDF", job_options=job_options)

You'd expect that the resulting memoryOverhead in the SparkApplication would be 1g, however, on inspecting the SparkApplication, I noticed it was set to only 128m.

Culprit seems to be here: https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/backend.py#L1821-L1835

            memOverheadBytes = as_bytes(executor_memory_overhead)
            jvmOverheadBytes = as_bytes("128m")

            # By default, Python uses the space reserved by `spark.executor.memoryOverhead` but no limit is enforced.
            # When `spark.executor.pyspark.memory` is specified, Python will only use this memory and no more.
            python_max = job_options.get("python-memory", None)
            if python_max is not None:
                python_max = as_bytes(python_max)
                if "executor-memoryOverhead" not in job_options:
                    memOverheadBytes = jvmOverheadBytes
                    executor_memory_overhead = f"{memOverheadBytes//(1024**2)}m"
            else:
                # If python-memory is not set, we convert most of the overhead memory to python memory
                python_max = memOverheadBytes - jvmOverheadBytes
                executor_memory_overhead = f"{jvmOverheadBytes//(1024**2)}m"

When python-memory is not set, it seems that the executor_memory_overhead is always set to jvmOverheadBytes and the executor-memoryOverhead is ignored.