Atoptool / atop

System and process monitor for Linux
GNU General Public License v2.0
789 stars 110 forks source link

fix atopacctd.c: failed to start atopacct.service #255

Closed liutingjieni closed 1 year ago

liutingjieni commented 1 year ago

The type of atopacct.service is "forking". If the parent process does not exit within 90 seconds after starting, the service is considered to have failed to start. After executing fork(), there is a situation where the child process gets scheduled before the parent process. When the child process reaches the kill() function and sends a signal to the parent process, the parent process has not yet been scheduled. After receiving the signal, the parent process executes the signal handling function but has not yet reached the pause() function. As a result, the parent process gets stuck in the pause() function and does not exit, causing the atopacct.service to fail to start.

liutingjieni commented 1 year ago

When mirroring is created in bulk within ByteDance, the startup time of mirroring becomes longer, and it is found that the startup of atopacct.service fails. So we further explored and found that it is because the parent process of atopacct killed the child process, but it did not succeed.

Hope to get your reply.

Atoptool commented 1 year ago

This is surely a timing issue that was not expected. I just added a signal catcher for the parent itself to avoid that systemctl status reports "atopacctd (code=killed, signal=TERM)".