Yarn deployment does not have a concurrent job limit

Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)

Apache License 2.0

25 stars 4 forks source link

Yarn deployment does not have a concurrent job limit #797

Open JeroenVerstraelen opened 3 weeks ago

JeroenVerstraelen commented 3 weeks ago

The yarn deployment currently does not have a concurrent_job_limit like we have in kubernetes with the concurrent_pod_limit config. Last week we had an issue where a user was able to submit ~80 spark jobs on the hadoop cluster until openeo jobs started failing with a hadoop queue limit exceeded message.

JeroenVerstraelen commented 3 weeks ago

It's good that we have a user queue system in our hadoop deployment but it was confusing for the user as to why their jobs were stuck in queued for so long. An added issue here was that some jobs were hanging and the user wasn't tracking those jobs.

JeroenVerstraelen commented 3 weeks ago

Best to stick with the yarn queue system and add more documentation around this feature for the users.