Open JeroenVerstraelen opened 3 weeks ago
It's good that we have a user queue system in our hadoop deployment but it was confusing for the user as to why their jobs were stuck in queued for so long. An added issue here was that some jobs were hanging and the user wasn't tracking those jobs.
Best to stick with the yarn queue system and add more documentation around this feature for the users.
The yarn deployment currently does not have a concurrent_job_limit like we have in kubernetes with the concurrent_pod_limit config. Last week we had an issue where a user was able to submit ~80 spark jobs on the hadoop cluster until openeo jobs started failing with a hadoop queue limit exceeded message.