Closed gschwind closed 2 years ago
Hello,
While the patch improve the situation, the patch leave an issue that the status file of crashed process is not updated.
The issue may be difficult to solve because how storage are currently handled. In my case where status file are stored as file, I may solve the issue, but in general cases the #354 will be the solution.
Best regards
Overview
Improve the management of sub-process on linux platform.
As describe in #493 process may crash and the database keep them as running process. At some point the server does not accept new request because it reach the max parallel request limit. This patch series expect to handle this situation more properly on linux platform. The patch series must, as preliminary requirement fix an issue with the MultiProcessing module which can keep zombies processes forever, here an instance of what I can get with
ps -f -u apache
:I think the issue is not limited to apache. To fix the issue this patch series provide a new DetachProcessing mode that actually detach the processing of request, ensuring that the new process get become a child of pid 1. This ensure that processes will not end up as zombies. This DetachProcessing should be good for any use case on linux, i.e. other server than apache.
Thus to ensure that the patch series work the configuration must use the mode detachprocessing, other wise terminated process cannot be detected. The heuristic used in the patch series is basically to check if a process exist with the stored pid, if the pid is not here anymore, we are sure that the process is not running anymore. In case the pid still there, we are not sure, because linux may reuse the pid for another process. This is why this is a safe heuristic but not 100% accurate.
I tried several more accurate heuristic, but sometime they do not work and other time make things much more complex.
Moreover the patch check and cleanup sub-process only when pywps reach the max parralel process limit, this mean that process may be considered as not finished for a long time, i.e. until we reach the max parralel process limit. It will be better to check it at every status request also, but to implement this we require to address #354 .
Best regards
Related Issue / Discussion
This is related to #493
Additional Information
Contribution Agreement
(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)