cloudfoundry-community / bosh-gen

Rapid generation of BOSH releases
MIT License
68 stars 21 forks source link

wait_pid() kills just to root process not its childs as well #85

Closed milkotodorov closed 7 years ago

milkotodorov commented 8 years ago

Hello,

the wait_pid() function in the ctl_utils.sh is killing just to root process and not the processes started by it. In case the starting script start another script with on its turn starts another and so on, stopping of the root process would not stop all the child processes.

It would be nice if this is extended so the whole tree is killed.

Regards, Milko

drnic commented 7 years ago

@milkotodorov sorry for not seeing this issue til now; do you have any suggestions for how the processes should be cleanly handled?

jhunt commented 7 years ago

Short of pulling the whole subtree of pids via some ps+grep hackery, I don't know of anyway to do this outside of C (or any language with ready access to both /proc and the raw system calls behind the kill / killall utilities).

Ideally, the root process would, upon receiving SIGTERM, turn around and kill its children, who would in turn kill their children, etc, etc.

milkotodorov commented 7 years ago

@drnic ideally killing the root process should kill it's children too. Since we can find the children of a process we can iterate recursively and kill them all. In this state all clild processes won't be terminated.

drnic commented 7 years ago

@milkotodorov do you have some code that works for you in other places that you'd like to see in all bosh releases?

milkotodorov commented 7 years ago

@drnic I had an issue with a bash script which started a java app. So the PID that was recorded was the one of the script and killing this PID didn't kill the java process. So my solution was to record the java PID instead and kill it. But for a better solution I think something like a kill_pid_tree() function could be introduced.

jhunt commented 7 years ago

Typically that's why you should exec from the bash script after storing the process ID, then the Java process is the old script process, with the correct PID.

jhunt commented 7 years ago

@milkotodorov were you able to test if using exec works for you in this scenario?

milkotodorov commented 7 years ago

yes - it worked. Thanks

drnic commented 7 years ago

Related to this topic: I am exploring the new https://github.com/cloudfoundry-incubator/bpm-release/ for wrapping job processes inside containers; it will mean we get rid of all our wrapper scripts.

WIP PR at https://github.com/cloudfoundry-community/bosh-gen/pull/102