Azure / iotedge

The IoT Edge OSS project
MIT License
1.47k stars 462 forks source link

Fix snap startup error #7351

Closed damonbarry closed 3 months ago

damonbarry commented 3 months ago

A recent fix to the azure-iot-edge snap (#7330) changed the daemon type of the docker-proxy service to notify to resolve install/startup timing issues. With that fix, docker-proxy now waits until it can establish communication with Docker before calling systemd-notify --ready to resume and allow aziot-edged to start.

A problem was discovered once the updated snap was published to the marketplace. If a user installs the snap without --devmode, docker-proxy startup fails with: "Got notification message from PID nnnn, but reception only permitted for main PID mmmm". This happens because systemd expects the systemd-notify --ready command to run in the service's main process, but bash runs it in its own process. Systemd can be configured to allow other options, but snapd doesn't expose them.

This change uses the shell builtin exec command to run systemd-notify in the same process as the parent script. It also uses the --exec option on systemd-notify to set up socat after systemd-notify returns. The --exec option is new, so the snap has been upgraded to core24.

Also, the configure hook was producing config files with empty hostnames, which caused failures on startup. It's unclear why the hostnamectl command isn't working in this instance (possibly something to do with the core24 upgrade?), but it turns out to be unnecessary because iotedge config apply will correctly determine/populate the hostname a little later. This change removes the logic to populate the hostname field from the configure hook.

To test, I built new snaps in the CI build pipeline, ran the end-to-end tests pipeline against them, and confirmed the snap jobs pass. Since the snap jobs install with --devmode, I also published to a temporary branch in the marketplace and tested manually.

Azure IoT Edge PR checklist:

This checklist is used to make sure that common guidelines for a pull request are followed.

General Guidelines and Best Practices

Testing Guidelines

damonbarry commented 3 months ago

FYI @alexclewontin

alexclewontin commented 3 months ago

@damonbarry ACK, I'll take a look shortly