On a worker node without hard memory limits, cmsRun may occasionally cause an out-of-memory (OOM) situation that leads to a system process being killed by the kernel.
The kernel can be "encouraged" to kill a cmsRun process instead of some other process setting /proc/PID/oom_score_adj to a value larger than 0, up to 1000 (see man oom_score_adj).
As this needs to be set for each process, would it make sense to let cmsRun set the value itself when it starts, writing to /proc/self/oom_score_adj ?
We could use something like process.options.oomScoreAdjust to make it configurable, and start with a default value between 100 (somewhat more likely) and 500 (much more likely).
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
On a worker node without hard memory limits,
cmsRun
may occasionally cause an out-of-memory (OOM) situation that leads to a system process being killed by the kernel.The kernel can be "encouraged" to kill a
cmsRun
process instead of some other process setting/proc/PID/oom_score_adj
to a value larger than0
, up to1000
(seeman oom_score_adj
).As this needs to be set for each process, would it make sense to let
cmsRun
set the value itself when it starts, writing to/proc/self/oom_score_adj
?We could use something like
process.options.oomScoreAdjust
to make it configurable, and start with a default value between100
(somewhat more likely) and500
(much more likely).