TritonDataCenter / smartos-live

For more information, please see http://smartos.org/ For any questions that aren't answered there, please join the SmartOS discussion list: https://smartos.topicbox.com/groups/smartos-discuss
1.57k stars 246 forks source link

Add workaround for 90 second LX systemd stop/reboot delay #1022

Open smokris opened 2 years ago

smokris commented 2 years ago

This PR modifies vmadm to send the systemd-shutdown process SIGCHLD periodically during a vmadm stop or vmadm reboot command, which causes systemd-shutdown to re-check its list of remaining processes, thereby removing the unnecessary delay.

Prior to this change, vmadm stop and vmadm reboot were taking slightly more than 90 seconds. With this change, each takes about 10 seconds.

See discussion and @danmcd's initial testing on https://smartos.topicbox.com/groups/smartos-discuss/T7eaf4fad91e64ad8-M2e374d3cdfd9366191bc09b7.

smokris commented 2 years ago

Thanks for the feedback, @danmcd.

how are you sure that vmobj.pid is the pid for systemd-shutdown?

If I understand correctly, vmobj.pid is the PID of the init process for the zone.

During shutdown, systemd replaces the init process image with systemd-shutdown (specifically, it calls execve() without a fork()), reusing the same PID.

What if you're sending SIGCHLD to something else? What about older or alternate LX images that don't have modern systemd?

Ah, good point. To reduce that risk, I could have it check the process's comm and only send the signal if it's systemd-shutdown.

danmcd commented 2 years ago

@bahamat and I had a chat offline... we're going to look a little deeper into this problem. Your analysis of how systemd-shutdown behaves is EXCELLENT, but we're wondering if this fix belongs somewhere lower in the stack, so zoneadm(8) can DTRT as well.

smokris commented 2 years ago

DAMN...

@danmcd, heh, no problem. (I had first looked at /proc/PID/argv, then realized /proc also provided a symlink to the binary and figured it would be slightly simpler/less-error-prone to check that link, rather than parsing argv.)

I've been testing on PI 20220324T002253Z using a lofs bind mount to override VM.js. So far I've tested successfully with a modified image 3dbbdcca-2eab-11e8-b925-23bf77789921 (CentOS 7, which I updated to CentOS Stream 8 running systemd-v239).

we're going to look a little deeper into this problem

Sounds good; keep me posted.

danmcd commented 2 years ago

I'm not sure if this requires fixes in both VM.js and in zone brand scripts. This one may get delayed for a bit; I'm sorry.

bahamat commented 2 years ago

I think if zoneadm handles this properly, vmadm should be fine as-is.