Azure / iotedge

The IoT Edge OSS project
MIT License
1.47k stars 462 forks source link

iotedge fails to start after upgrade to moby & moby-cli(3.0.9+azure38) #2354

Closed JeeWeetje closed 4 years ago

JeeWeetje commented 4 years ago

Expected Behavior

iotedge list should have returned a list of running modules (edgeAgent & edgeHub in this case

Current Behavior

Tell us what happens instead of the expected behavior

Instead of a list of modules iotedge list shows an error:

$ iotedge list
A module runtime error occurred
        caused by: Could not list modules
        caused by: an IO error occurred: Connection reset by peer (os error 104)

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Get a Raspberry Pi 2 or 3, install Raspbian Stretch 9.11 on, install moby, moby-cli, iotedge 1.0.9-rc4 and all the needed packages on it
  2. deploy an empty manifest from VS Code with only edgeAgent and edgeHub included
  3. iotedge list on the Raspberry Pi shows edgeAgent & edgeHub
  4. hit sudo apt update, sudo apt upgrade on the Raspberry Pi
  5. You will notice moby and moby-cli 3.0.9+azure38 will be installed among the updated packages
  6. iotedge list now shows an error
  7. sudo docker version shows an error half way the information
  8. systemctl restart iotedge fails

Context (Environment)

Hardware: Raspberry Pi 2, ethernet wired to DSL router, TMP via i2c installed and used for DPS registration. RPi is used from home automation with smart meter for electricity/gas and to control smart lights via a Philips Hue bridge. OS and packages always on par with updates (per day), iotedge running latest public version (not stable but preview releases).

Output of iotedge check

Click here ``` $ sudo iotedge check Configuration checks -------------------- √ config.yaml is well-formed - OK × config.yaml has well-formed connection string - Error Device is not using manual provisioning, so Azure IoT Hub hostname needs to be specified with --iothub-hostname × container engine is installed and functional - Error Could not communicate with container engine at unix:///var/run/docker.sock. Please check your moby-engine installation and ensure the service is running. √ config.yaml has correct hostname - OK ‼ latest security daemon - Warning Installed IoT Edge daemon has version 1.0.9~rc4 but 1.0.8 is the latest stable version available. Please see https://aka.ms/iotedge-update-runtime for update instructions. √ host time is close to real time - OK × DNS server - Error Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. × IPv6 network configuration - Error Container engine is not configured for IPv6 communication. Please see https://aka.ms/iotedge-docker-ipv6 for a guide on how to enable IPv6 support. ‼ production readiness: certificates - Warning The Edge device is using self-signed automatically-generated development certificates. They will expire in 29 days (at 2020-02-20 18:24:37 UTC) causing module-to-module and downstream device communication to fail on an active deployment. After the certs have expired, restarting the IoT Edge daemon will trigger it to generate new development certs with 90-day expiry. Please consider using production certificates instead. See https://aka.ms/iotedge-prod-checklist-certs for best practices. × production readiness: logs policy - Error Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. Connectivity checks ------------------- √ host can connect to and perform TLS handshake with DPS endpoint - OK 4 check(s) succeeded. 2 check(s) raised warnings. Re-run with --verbose for more details. 5 check(s) raised errors. Re-run with --verbose for more details. 14 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details. ```

Device Information

Runtime Versions

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

iotedged logs ``` $ sudo systemctl status iotedge ● iotedge.service - Azure IoT Edge daemon Loaded: loaded (/lib/systemd/system/iotedge.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2020-01-22 22:55:25 CET; 6s ago Docs: man:iotedged(8) Main PID: 16250 (iotedged) Tasks: 10 (limit: 4915) Memory: 2.1M CPU: 286ms CGroup: /system.slice/iotedge.service └─16250 /usr/bin/iotedged -c /etc/iotedge/config.yaml Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Using config file: /etc/iotedge/config.yaml Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Configuring /var/lib/iotedge as the home directory. Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Configuring certificates... Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Transparent gateway certificates not found, operating in quick start modJan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Finished configuring provisioning environment variables and certificatesJan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Initializing hsm... Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Finished initializing hsm. Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Provisioning edge device... Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Starting provisioning edge device via TPM... Jan 22 22:55:25 home-control iotedged[16250]: 2020-01-22T21:55:25Z [INFO] - Starting DPS registration with scope_id "0ne00052797", registration_id " ```
edge-agent logs ``` $ iotedge logs edgeAgent A module runtime error occurred caused by: Could not get logs for module edgeAgent caused by: an IO error occurred: Connection reset by peer (os error 104) ```
edge-hub logs ``` $ iotedge logs edgeHub A module runtime error occurred caused by: Could not get logs for module edgeHub caused by: an IO error occurred: Connection reset by peer (os error 104) ```

Additional Information

Please provide any additional information that may be helpful in understanding the issue.

$ journalctl -xe
Jan 22 10:56:36 home-control systemd[1]: Stopping Azure IoT Edge daemon management socket.
-- Subject: Unit iotedge.mgmt.socket has begun shutting down
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit iotedge.mgmt.socket has begun shutting down.
Jan 22 10:56:36 home-control systemd[1]: Starting Azure IoT Edge daemon management socket.
-- Subject: Unit iotedge.mgmt.socket has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit iotedge.mgmt.socket has begun starting up.
Jan 22 10:56:36 home-control systemd[1]: Listening on Docker Socket for the API.
-- Subject: Unit docker.socket has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit docker.socket has finished starting up.
--
-- The start-up result is done.
Jan 22 10:56:36 home-control systemd[1]: Listening on Azure IoT Edge daemon workload socket.
-- Subject: Unit iotedge.socket has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit iotedge.socket has finished starting up.
--
-- The start-up result is done.
Jan 22 10:56:36 home-control systemd[1]: Listening on Azure IoT Edge daemon management socket.
-- Subject: Unit iotedge.mgmt.socket has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit iotedge.mgmt.socket has finished starting up.
--
-- The start-up result is done.
Jan 22 10:56:36 home-control systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit docker.service has begun starting up.
JeeWeetje commented 4 years ago

The received version via sudo apt upgrade of the moby-engine and moby-cli is odd and different from stable release numbers. Has something in the moby CI/CD pipeline gone wrong, publishing a wrong package?

$ sudo apt upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
  aufs-dkms aufs-tools dkms linux-kbuild-4.9 raspberrypi-kernel-headers
Use 'sudo apt autoremove' to remove them.
The following packages will be upgraded:
  moby-cli moby-engine
2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 53.4 MB of archives.
After this operation, 48.2 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 https://packages.microsoft.com/debian/stretch/multiarch/prod stretch/main armhf moby-cli armhf 3.0.9+azure-38 [19.5 MB]
Get:2 https://packages.microsoft.com/debian/stretch/multiarch/prod stretch/main armhf moby-engine armhf 3.0.9+azure-38 [33.9 MB]
Fetched 53.4 MB in 26s (2,041 kB/s)
Reading changelogs... Done
(Reading database ... 120781 files and directories currently installed.)
Preparing to unpack .../moby-cli_3.0.9+azure-38_armhf.deb ...
Unpacking moby-cli (3.0.9+azure-38) over (3.0.7) ...
Preparing to unpack .../moby-engine_3.0.9+azure-38_armhf.deb ...
Unpacking moby-engine (3.0.9+azure-38) over (3.0.7) ...
Setting up moby-engine (3.0.9+azure-38) ...
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
docker.service couldn't start.
Setting up moby-cli (3.0.9+azure-38) ...
veyalla commented 4 years ago

Adding @cpuguy83

cpuguy83 commented 4 years ago

Can you check journalctl -fu docker?

cpuguy83 commented 4 years ago

Has something in the moby CI/CD pipeline gone wrong, publishing a wrong package?

No this is a new build pipeline and the extra metadata on the version numbers is expected.

JeeWeetje commented 4 years ago

@cpuguy83 thanks for pointing at journalctl -fu docker

It showed this:

Jan 22 23:13:00 home-control dockerd[1945]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '}' after array element

Checked my daemon.json file and there was an } instead of a ] in my listed dns server ip addresses list.

...
"dns": [
        "1xx.1xx.6.66",
        "1xx.1xx.9.99",
        "1xx.1xx.104.104"
        },
...

Must have been last night late night alternations (was fiddling around with DNS and IPv6 stuff, missed this revert to normal adjustment. :-(

Saving the correct daemon.json file and restarting docker fixed it.

Sorry for the reporting, thanks for clarifying the new versioning metadata on the moby packages. Closing the issue.

cpuguy83 commented 4 years ago

😅 Thanks for reporting back!