Closed pmoravec closed 4 years ago
Hello @adamruzicka Should it be fixed in the product? I mean, how to limit Dynflow memory then? I think there is some periodic cleanup of Dynflow executors, so if pausing the task is caused by killing the executor, this is not an ultimate fix, although I agree it might fix most of the problems?
6.8 and newer will have the "new dynflow" with sidekiq, redis and separate worker processes. With this kind of deployment, those env variables won't do anything. The upside here is that the workers are stateless and can be restarted without causing any issues, which wasn't the case with the previous way of deploying it. From my testing, I haven't seen them consume too much memory, but if someone still wants to limit their memory usage, I'd suggest leveraging systemd's resource control, primarily MemoryHigh
and MemoryMax
if needed.
Before 6.8 things are not as great. There have been cases of executors consuming too much memory so we came together with the idea that the process would monitor its own memory usage and kill itself if it grows too much. The gotcha here was that it could (and often did) lead to tasks in a paused state (the infamous abnormal termination error) and users had to deal with those tasks by hand. Other issue was the tuning guide suggests using 3gb as the limit. If you are at scale where you need to follow the tuning guide, then 3 gigs probably won't cut it for you and it will do you more harm than good. I just wouldn't recommend using them unless people know what they are doing and the issues it can cause. I wouldn't mind if people used this sensibly, but it is hard to set some kind of guidlines, because it depends on the workload.
Now to answer your questions. Should it be fixed in the product? Yes and no, the options will still be there and will work until people upgrade to 6.8. We determined this wasn't fixable with just minor changes, so we made more radical ones and they will land in 6.8, where we should be able to leverage systemd to do it for us.
Of course this is not a silver bullet and there's only so much we can do when the process gets killed with SIGKILL for example. We try to do our best to recover from such situations, but sometimes it is safer to just pause and let the user decide what to do.
I see. Got it Pavel and Adam! I was thinking about EXECUTOR_MEMORY_LIMIT
more like safety net in case there some bug that cause too big memory usage, but looks like we should not be recommending it by default without knowledge of specific workloads. I agree with merging this fix. Thank you!
Using EXECUTOR_MEMORY_LIMIT and 2 similar parameters can leave some tasks in paused state forever. Therefore it is not recommended to use those parameters.
Resolves: #4
Signed-off-by: Pavel Moravec pmoravec@redhat.com