Executor OOM on simple process graph

jdries commented 9 months ago

Executors go out of memory on a relatively simple PG:

{ "process_graph": { "loadco1": { "arguments": { "bands": [ "B03", "B04", "B02" ], "id": "SENTINEL2_L2A", "spatial_extent": { "east": 13.132874787754762, "north": 42.192832302555644, "south": 41.200957744003716, "west": 11.966782528409608 }

, "temporal_extent": [ "2023-08-18T00:00:00Z", null ] }, "description": "Load the data, including the bands:\r\n- G = B03\r\n- R = B04\r\n- B = B02", "process_id": "load_collection" }, "reduce1": { "arguments": { "data": { "from_node": "loadco1" }

, "dimension": "bands", "reducer": { "process_graph": { "add1": { "arguments": { "x": { "from_node": "multip2" }

, "y": { "from_node": "arraye2" }

}, "process_id": "add" }, "add2": { "arguments": { "x": { "from_node": "add1" }

, "y": { "from_node": "arraye3" }

}, "process_id": "add" }, "arraye1": { "arguments": { "data": { "from_parameter": "data" }

, "index": 0 }, "process_id": "array_element" }, "arraye2": { "arguments": { "data": { "from_parameter": "data" }

, "index": 1 }, "process_id": "array_element" }, "arraye3": { "arguments": { "data": { "from_parameter": "data" }

, "index": 2 }, "process_id": "array_element" }, "divide1": { "arguments": { "x": { "from_node": "subtra2" }

, "y": { "from_node": "add2" }

}, "process_id": "divide", "result": true }, "multip1": { "arguments": { "x": 2, "y": { "from_node": "arraye1" }

}, "process_id": "multiply" }, "multip2": { "arguments": { "x": 2, "y": { "from_node": "arraye1" }

}, "process_id": "multiply" }, "subtra1": { "arguments": { "x": { "from_node": "multip1" }

, "y": { "from_node": "arraye2" }

}, "process_id": "subtract" }, "subtra2": { "arguments": { "x": { "from_node": "subtra1" }

, "y": { "from_node": "arraye3" }

}, "process_id": "subtract" } } } }, "description": "Compute the GLI (Green Leaf Index) for the bands dimension\r\nFormula: (2.0 G - R - B) / (2.0 G + R + B)", "process_id": "reduce_dimension" }, "savere1": { "arguments": { "data": { "from_node": "reduce1" }

, "format": "NETCDF" }, "description": "Store as NETCDF", "process_id": "save_result", "result": true } } }

jdries commented 9 months ago

Some analysis: it's the K8S OOM killer, so the problem is memory overhead which seems to be higher than 1800MB. Maybe GDAL gets a bit too much cache size, or JVM should get a little less heap memory.

The process graph was created by the wizard for spectral indices, which allows you to easily create process graphs that are problematic:

no eo:cloud_cover property on sentinel-2
no cloud masking at all
the default start time is start time of the whole collection
the default end time is 'open ended', combined with start time you get the whole archive
the spatial extent allows you to select from whole world, again giving a tendency towards selecting large areas

jdries commented 9 months ago

The spark stages finished without problems with these settings: "executor-memory": "1400m", "executor-memoryOverhead": "2400m",

So overall executor memory did not need an increase, we only need more memoryOverhead.

Partition sizes were 32MB, so quite acceptable, I wouldn't make them smaller. GC times did not increase when reducing jvm heapsize.

This may also be related to settings for eviction, decommissioning was not given a chance when executors were killed.

jdries commented 9 months ago

I changed default memory settings, which means we'll only be able to run 3 pods on a 16GB node. This issue is a good basis for further investigation and subsequently trying to go below 4GB pod size again.

jdries commented 8 months ago

This issue is quite important because the high memory overhead is making a lot of jobs crash.

One theory is that it's caused by cached gdal datasets, so we could consider having a smaller cache, I added an env var for that: GDAL_DATASET_CACHE_SIZE