Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

Executor OOM on simple process graph #238

Closed jdries closed 3 months ago

jdries commented 9 months ago

Executors go out of memory on a relatively simple PG:

{ "process_graph": { "loadco1": { "arguments": { "bands": [ "B03", "B04", "B02" ], "id": "SENTINEL2_L2A", "spatial_extent": { "east": 13.132874787754762, "north": 42.192832302555644, "south": 41.200957744003716, "west": 11.966782528409608 }

, "temporal_extent": [ "2023-08-18T00:00:00Z", null ] }, "description": "Load the data, including the bands:\r\n- G = B03\r\n- R = B04\r\n- B = B02", "process_id": "load_collection" }, "reduce1": { "arguments": { "data": { "from_node": "loadco1" }

, "dimension": "bands", "reducer": { "process_graph": { "add1": { "arguments": { "x": { "from_node": "multip2" }

, "y": { "from_node": "arraye2" }

}, "process_id": "add" }, "add2": { "arguments": { "x": { "from_node": "add1" }

, "y": { "from_node": "arraye3" }

}, "process_id": "add" }, "arraye1": { "arguments": { "data": { "from_parameter": "data" }

, "index": 0 }, "process_id": "array_element" }, "arraye2": { "arguments": { "data": { "from_parameter": "data" }

, "index": 1 }, "process_id": "array_element" }, "arraye3": { "arguments": { "data": { "from_parameter": "data" }

, "index": 2 }, "process_id": "array_element" }, "divide1": { "arguments": { "x": { "from_node": "subtra2" }

, "y": { "from_node": "add2" }

}, "process_id": "divide", "result": true }, "multip1": { "arguments": { "x": 2, "y": { "from_node": "arraye1" }

}, "process_id": "multiply" }, "multip2": { "arguments": { "x": 2, "y": { "from_node": "arraye1" }

}, "process_id": "multiply" }, "subtra1": { "arguments": { "x": { "from_node": "multip1" }

, "y": { "from_node": "arraye2" }

}, "process_id": "subtract" }, "subtra2": { "arguments": { "x": { "from_node": "subtra1" }

, "y": { "from_node": "arraye3" }

}, "process_id": "subtract" } } } }, "description": "Compute the GLI (Green Leaf Index) for the bands dimension\r\nFormula: (2.0 G - R - B) / (2.0 G + R + B)", "process_id": "reduce_dimension" }, "savere1": { "arguments": { "data": { "from_node": "reduce1" }

, "format": "NETCDF" }, "description": "Store as NETCDF", "process_id": "save_result", "result": true } } }

jdries commented 9 months ago

Some analysis: it's the K8S OOM killer, so the problem is memory overhead which seems to be higher than 1800MB. Maybe GDAL gets a bit too much cache size, or JVM should get a little less heap memory.

The process graph was created by the wizard for spectral indices, which allows you to easily create process graphs that are problematic:

jdries commented 9 months ago

The spark stages finished without problems with these settings: "executor-memory": "1400m", "executor-memoryOverhead": "2400m",

So overall executor memory did not need an increase, we only need more memoryOverhead.

Partition sizes were 32MB, so quite acceptable, I wouldn't make them smaller. GC times did not increase when reducing jvm heapsize.

This may also be related to settings for eviction, decommissioning was not given a chance when executors were killed.

jdries commented 9 months ago

I changed default memory settings, which means we'll only be able to run 3 pods on a 16GB node. This issue is a good basis for further investigation and subsequently trying to go below 4GB pod size again.

jdries commented 8 months ago

This issue is quite important because the high memory overhead is making a lot of jobs crash.

One theory is that it's caused by cached gdal datasets, so we could consider having a smaller cache, I added an env var for that: GDAL_DATASET_CACHE_SIZE

EmileSonneveld commented 8 months ago

I reverted some changes for this ticket to get trough integrationtests: https://github.com/Open-EO/openeo-geotrellis-extensions/commit/e4ecc450d84b1bc1d3257751d48aa9bba36ca20f

jdries commented 7 months ago

I was able to run the job with much lower memory, with these specific settings, leaving others unchanged:

    memory: 1G
    memoryOverhead: 2G
    - name: GDAL_DATASET_CACHE_SIZE
      value: "8"
    - name: GDAL_CACHEMAX
      value: "80"

Data loading time: 10.7h Peak memory recorded: 2.65GB

jdries commented 7 months ago

Settings do seem to impact performance?

Data loading time: 6.8 h 
    - name: GDAL_CACHEMAX
      value: "150"
    - name: GDAL_DATASET_CACHE_SIZE
      value: "32"
    memory: 1G
    memoryOverhead: 4G

In this last job, individual pods peaked at 3.9GB (recorded), while most pods stayed even below 3GB. This again shows that the OOM problem is really about individual pods.

jdries commented 7 months ago
Total Time Across All Tasks: 9.2 h 
    - name: GDAL_DATASET_CACHE_SIZE
      value: "16"
    - name: GDAL_CACHEMAX
      value: "150"
    memory: 1G
    memoryOverhead: 4G

Peak memory: 3.3GB

jdries commented 7 months ago

Total Time Across All Tasks: 8.2 h memory: 1G memoryOverhead: 3G

jdries commented 7 months ago

Total Time Across All Tasks: 7.2 h

jdries commented 7 months ago

Total Time Across All Tasks: 7.6 h memory: 1G memoryOverhead: 3G

jdries commented 7 months ago

Total Time Across All Tasks: 7.4 h memory: 1G memoryOverhead: 3G

jdries commented 7 months ago

Had a working job using below 4GB (one executor went OOM, but it survived):


      "executor-memory": "1800m",
      "executor-memoryOverhead": "2G",
        "gdal-cachemax":"100",
        "gdal-dataset-cache-size":"12"

Billing
Incurred Costs:
74 credits
Usage Metrics
CPU usage
56,890.597 cpu-seconds
Wall time
1,831 seconds
Input Pixel
12,953.25 mega-pixel
Memory usage
182,775,168.592 mb-seconds
Network Received
6,288,949,351,996 b
jdries commented 3 months ago

GLI example runs better now. Mainly thanks to subsequent improvements to reading, and executor decommissioning.