Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 458 forks source link

iotedge check failed to report connectivity errors (was "iotedge 1.1.4 reinstall on Raspberry Pi OS fails to start") #5396

Open kimhanse opened 3 years ago

kimhanse commented 3 years ago

We have 9 devices running with iotedge on vessels around the world, 6 of them failed within minutes of each other on July 29. They just stopped reporting to Azure and started giving different errors in the logs. We have not been able for figure out what happened.

Upgrading some packages and restarting the devices a few times have brought 3 of them back online and working again. We still have 3 devices that are stuck and then are broken in the same way. We have tried to get back to a clean slate and thus uninstalled and reinstall iotedge, even with that we keep getting the same error. The security daemon starts, but does not try to pull the docker images for the agent or hub.

Expected Behavior

We except the security daemon to pull and start the iotedge agent and hub. Or to at least report why that doesn't happen.

Current Behavior

When the fresh install of iotedge is starting up it fails to start the service modules, instead it complaints about failed connections to them:

Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Starting watchdog with 60 second frequency...
Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Listening on fd://iotedge.mgmt.socket/ with 1 thread for management API.
Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Listening on fd://iotedge.socket/ with 1 thread for workload API.
Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Checking edge runtime status
Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Creating and starting edge runtime module edgeAgent
Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - Error in watchdog when checking for edge runtime status:
Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - A module runtime error occurred.
Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] -         caused by: Could not get identity $edgeAgent
Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] -         caused by: Could not get module $edgeAgent
Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] -         caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking ID:ab11464ec5db43d69b8a38bdd0e1777e-G:3-TimeStamp:08/20/2021 11:48:06"}

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Uninstall moby-engine and iotedge from device
  2. Install moby-engine
  3. Install iotedge
  4. Setup the old config of iotedge

Context (Environment)

Output of iotedge check

Click here ``` root@tracey-1:~# iotedge check --verbose --ntp-server=time1.google.com:123 Configuration checks -------------------- √ config.yaml is well-formed - OK √ config.yaml has well-formed connection string - OK √ container engine is installed and functional - OK √ config.yaml has correct hostname - OK × config.yaml has correct URIs for daemon mgmt endpoint - Error SocketError - SocketErrorCode (AccessDenied) : Permission denied /var/run/iotedge/mgmt.sock One or more errors occurred. (Got bad response: ) caused by: docker returned exit code: 1, stderr = SocketError - SocketErrorCode (AccessDenied) : Permission denied /var/run/iotedge/mgmt.sock One or more errors occurred. (Got bad response: ) √ latest security daemon - OK √ host time is close to real time - OK √ container time is close to host time - OK √ DNS server - OK √ production readiness: certificates - OK √ production readiness: container engine - OK √ production readiness: logs policy - OK × production readiness: Edge Agent's storage directory is persisted on the host filesystem - Error Could not check current state of edgeAgent container caused by: docker returned exit code: 1, stderr = Error: No such object: edgeAgent × production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error Could not check current state of edgeHub container caused by: docker returned exit code: 1, stderr = Error: No such object: edgeHub Connectivity checks ------------------- √ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK √ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK √ container on the default network can connect to IoT Hub AMQP port - OK √ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK √ container on the default network can connect to IoT Hub MQTT port - OK √ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK √ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK √ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK 20 check(s) succeeded. 3 check(s) raised errors. root@tracey-1:~# ```

Device Information

Runtime Versions

Logs

iotedged and system logs Logs from `journalctl` ``` Aug 20 11:47:21 tracey-1 systemd[1]: Reloading. Aug 20 11:47:21 tracey-1 systemd[1]: Starting Azure IoT Edge daemon management socket. Aug 20 11:47:21 tracey-1 systemd[1]: Starting Azure IoT Edge daemon workload socket. Aug 20 11:47:21 tracey-1 systemd[1]: Listening on Azure IoT Edge daemon management socket. Aug 20 11:47:21 tracey-1 systemd[1]: Listening on Azure IoT Edge daemon workload socket. Aug 20 11:47:21 tracey-1 systemd[1]: Started Azure IoT Edge daemon. Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Starting Azure IoT Edge Security Daemon Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Version - 1.1.4 Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Using config file: /etc/iotedge/config.yaml Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Configuring /var/lib/iotedge as the home directory. Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Configuring certificates... Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Transparent gateway certificates not found, operating in quick start mode... Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Finished configuring provisioning environment variables and certificates. Aug 20 11:47:21 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] - Initializing hsm... Aug 20 11:47:22 tracey-1 systemd[1]: Reloading. Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [INFO] - Finished initializing hsm. Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [INFO] - Provisioning edge device... Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [INFO] - Starting provisioning edge device via manual mode using a device connection string... Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [ERR!] - The daemon could not start up successfully: Could not load settings Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [ERR!] - caused by: Edge device information is required. Aug 20 11:47:44 tracey-1 iotedged[5499]: Please update config.yaml with the IoT Hub connection information. Aug 20 11:47:44 tracey-1 iotedged[5499]: See https://aka.ms/iot-edge-configure-linux for more details. Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] (/project/edgelet/hsm-sys/azure-iot-hsm-c/src/hsm_log.c:log_init:41) Initialized logging Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] (/project/edgelet/hsm-sys/azure-iot-hsm-c/src/edge_hsm_client_store.c:generate_edge_hsm_certificates_if_needed:1629) Load status 1. Regenerating owner and device CA certs and keys Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:21Z [INFO] (/project/edgelet/hsm-sys/azure-iot-hsm-c/src/edge_pki_openssl.c:generate_rsa_key:346) Generating RSA key of length 4096 Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:33Z [INFO] (/project/edgelet/hsm-sys/azure-iot-hsm-c/src/edge_pki_openssl.c:generate_rsa_key:346) Generating RSA key of length 4096 Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [ERR!] (/project/edgelet/hsm-sys/azure-iot-hsm-c/src/hsm_utils.c:read_file_into_buffer_impl:167) Could not open file for reading /var/lib/iotedge/hsm/enc_keys/edgelet-masterWt5mT2xpO72EPKlt2Tt0Sq4uJCrMvfl2rzzKRB3pnyo_.enc.key. Errno 2 'No such file or directory' Aug 20 11:47:44 tracey-1 iotedged[5499]: 2021-08-20T11:47:44Z [INFO] (/project/edgelet/hsm-sys/azure-iot-hsm-c/src/edge_hsm_client_store.c:load_encryption_key_from_file:818) Could not read encryption key from file. No key file exists or is invalid or permission error. Aug 20 11:47:44 tracey-1 systemd[1]: iotedge.service: Main process exited, code=exited, status=153/n/a Aug 20 11:47:44 tracey-1 systemd[1]: iotedge.service: Failed with result 'exit-code'. Aug 20 11:47:51 tracey-1 systemd[1]: iotedge.mgmt.socket: Succeeded. Aug 20 11:47:51 tracey-1 systemd[1]: Closed Azure IoT Edge daemon management socket. Aug 20 11:47:51 tracey-1 systemd[1]: Stopping Azure IoT Edge daemon management socket. Aug 20 11:47:51 tracey-1 systemd[1]: Starting Azure IoT Edge daemon management socket. Aug 20 11:47:51 tracey-1 systemd[1]: iotedge.socket: Succeeded. Aug 20 11:47:51 tracey-1 systemd[1]: Closed Azure IoT Edge daemon workload socket. Aug 20 11:47:51 tracey-1 systemd[1]: Stopping Azure IoT Edge daemon workload socket. Aug 20 11:47:51 tracey-1 systemd[1]: Starting Azure IoT Edge daemon workload socket. Aug 20 11:47:51 tracey-1 systemd[1]: Listening on Azure IoT Edge daemon management socket. Aug 20 11:47:51 tracey-1 systemd[1]: Listening on Azure IoT Edge daemon workload socket. Aug 20 11:47:51 tracey-1 systemd[1]: Started Azure IoT Edge daemon. Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Starting Azure IoT Edge Security Daemon Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Version - 1.1.4 Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Using config file: /etc/iotedge/config.yaml Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Configuring /var/lib/iotedge as the home directory. Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Configuring certificates... Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Configuring the Device CA certificate using "/etc/iotedge/azure-iot.root.ca.cert.pem". Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Configuring the Device private key using "/etc/iotedge/azure-iot.root.ca.key.pem". Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Configuring the trusted CA certificates using "/etc/iotedge/azure-iot.root.ca.cert.pem". Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Finished configuring provisioning environment variables and certificates. Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Initializing hsm... Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Finished initializing hsm. Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Provisioning edge device... Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Starting provisioning edge device via manual mode using a device connection string... Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Manually provisioning device "tracey-1" in hub "CSP-Autolog-IoT.azure-devices.net" Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Finished provisioning edge device. Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Initializing the module runtime... Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Initializing module runtime... Aug 20 11:47:51 tracey-1 iotedged[5587]: 2021-08-20T11:47:51Z [INFO] - Using runtime network id azure-iot-edge Aug 20 11:47:51 tracey-1 systemd-udevd[153]: /lib/udev/rules.d/80-moby-engine.rules:1: Invalid key/value pair, starting at character 1 ('c') Aug 20 11:47:51 tracey-1 systemd-udevd[5608]: Using default interface naming scheme 'v240'. Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Successfully initialized module runtime Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Finished initializing the module runtime. Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Stopping all modules... Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Finished stopping modules. Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Detecting if configuration file has changed... Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Change to configuration file detected. Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Removing all modules... Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Finished removing modules. Aug 20 11:47:52 tracey-1 iotedged[5587]: 2021-08-20T11:47:52Z [INFO] - Edge issuer CA expiration date: 2034-07-23T21:00:33Z Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Starting management API... Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Starting workload API... Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Starting watchdog with 60 second frequency... Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Listening on fd://iotedge.mgmt.socket/ with 1 thread for management API. Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Listening on fd://iotedge.socket/ with 1 thread for workload API. Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Checking edge runtime status Aug 20 11:48:03 tracey-1 iotedged[5587]: 2021-08-20T11:48:03Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - Error in watchdog when checking for edge runtime status: Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - A module runtime error occurred. Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - caused by: Could not get identity $edgeAgent Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - caused by: Could not get module $edgeAgent Aug 20 11:48:06 tracey-1 iotedged[5587]: 2021-08-20T11:48:06Z [WARN] - caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking ID:ab11464ec5db43d69b8a38bdd0e1777e-G:3-TimeStamp:08/20/2021 11:48:06"} Aug 20 11:49:03 tracey-1 iotedged[5587]: 2021-08-20T11:49:03Z [INFO] - Checking edge runtime status Aug 20 11:49:03 tracey-1 iotedged[5587]: 2021-08-20T11:49:03Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 20 11:49:04 tracey-1 iotedged[5587]: 2021-08-20T11:49:04Z [WARN] - Error in watchdog when checking for edge runtime status: Aug 20 11:49:04 tracey-1 iotedged[5587]: 2021-08-20T11:49:04Z [WARN] - A module runtime error occurred. Aug 20 11:49:04 tracey-1 iotedged[5587]: 2021-08-20T11:49:04Z [WARN] - caused by: Could not get identity $edgeAgent Aug 20 11:49:04 tracey-1 iotedged[5587]: 2021-08-20T11:49:04Z [WARN] - caused by: Could not get module $edgeAgent Aug 20 11:49:04 tracey-1 iotedged[5587]: 2021-08-20T11:49:04Z [WARN] - caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking ID:ede6e6809e354ab09d9a901ec5b0a684-G:3-TimeStamp:08/20/2021 11:49:04"} Aug 20 11:50:03 tracey-1 iotedged[5587]: 2021-08-20T11:50:03Z [INFO] - Checking edge runtime status Aug 20 11:50:03 tracey-1 iotedged[5587]: 2021-08-20T11:50:03Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 20 11:50:04 tracey-1 iotedged[5587]: 2021-08-20T11:50:04Z [WARN] - Error in watchdog when checking for edge runtime status: Aug 20 11:50:04 tracey-1 iotedged[5587]: 2021-08-20T11:50:04Z [WARN] - A module runtime error occurred. Aug 20 11:50:04 tracey-1 iotedged[5587]: 2021-08-20T11:50:04Z [WARN] - caused by: Could not get identity $edgeAgent Aug 20 11:50:04 tracey-1 iotedged[5587]: 2021-08-20T11:50:04Z [WARN] - caused by: Could not get module $edgeAgent Aug 20 11:50:04 tracey-1 iotedged[5587]: 2021-08-20T11:50:04Z [WARN] - caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking ID:fa2aa3c55b92462eaaff7d3fb44bca99-G:3-TimeStamp:08/20/2021 11:50:03"} Aug 20 11:51:03 tracey-1 iotedged[5587]: 2021-08-20T11:51:03Z [INFO] - Checking edge runtime status Aug 20 11:51:03 tracey-1 iotedged[5587]: 2021-08-20T11:51:03Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 20 11:51:04 tracey-1 iotedged[5587]: 2021-08-20T11:51:04Z [WARN] - Error in watchdog when checking for edge runtime status: Aug 20 11:51:04 tracey-1 iotedged[5587]: 2021-08-20T11:51:04Z [WARN] - A module runtime error occurred. Aug 20 11:51:04 tracey-1 iotedged[5587]: 2021-08-20T11:51:04Z [WARN] - caused by: Could not get identity $edgeAgent Aug 20 11:51:04 tracey-1 iotedged[5587]: 2021-08-20T11:51:04Z [WARN] - caused by: Could not get module $edgeAgent Aug 20 11:51:04 tracey-1 iotedged[5587]: 2021-08-20T11:51:04Z [WARN] - caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking ID:f342d45d04c44bc5ab2221a6b91db618-G:3-TimeStamp:08/20/2021 11:51:04"} Aug 20 11:52:03 tracey-1 iotedged[5587]: 2021-08-20T11:52:03Z [INFO] - Checking edge runtime status Aug 20 11:52:03 tracey-1 iotedged[5587]: 2021-08-20T11:52:03Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 20 11:52:04 tracey-1 iotedged[5587]: 2021-08-20T11:52:04Z [WARN] - Error in watchdog when checking for edge runtime status: Aug 20 11:52:04 tracey-1 iotedged[5587]: 2021-08-20T11:52:04Z [WARN] - A module runtime error occurred. Aug 20 11:52:04 tracey-1 iotedged[5587]: 2021-08-20T11:52:04Z [WARN] - caused by: Could not get identity $edgeAgent Aug 20 11:52:04 tracey-1 iotedged[5587]: 2021-08-20T11:52:04Z [WARN] - caused by: Could not get module $edgeAgent Aug 20 11:52:04 tracey-1 iotedged[5587]: 2021-08-20T11:52:04Z [WARN] - caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking ID:2011dd76518f4b5c8d62a0ad78154c59-G:3-TimeStamp:08/20/2021 11:52:04"} ```
iotedge list ``` root@tracey-1:~# iotedge list NAME STATUS DESCRIPTION CONFIG root@tracey-1:~# ```

No edge-agent logs or edge-hub logs as the modules aren't even loaded.

Additional Information

Please provide any additional information that may be helpful in understanding the issue.

pmzara commented 3 years ago

Thanks, @kimhanse for your report. How is your connection string? All the devices are connecting with the same IoT Hub?

kimhanse commented 3 years ago

Connection string: HostName=CSP-Autolog-IoT.azure-devices.net;DeviceId=tracey-1;SharedAccessKey=<secret key>=

All 9 devices are connected to the same IoT Hub.

I am also running 3 other similar devices connected to a different IoT Hub, they are not affected by this problem.

kimhanse commented 3 years ago

I have run the first startup of the security manager on both the device in the field where we have the problem and a test setup we have in the office. I am using the same connection string for both systems, and I have verified that all installed packages etc. are the same on the two devices.

I have enabled debug logging in order to get some more information but I can not see why the systems are acting differently.

I am not sending the entire log, but all the earlier parts that I have removed are practically the same for the two runs.

tracey-1 failing correct start ``` Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [edgelet_http::client] Success generating token for request GET /devices/tracey-1/modules/$edgeAgent Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [hyper::client::connect::dns] resolving host="csp-autolog-iot.azure-devices.net" Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [hyper::client::connect::http] connecting to 13.79.172.43:443 Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:18 tracey-1 iotedged[4618]: 2021-08-23T08:41:18Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [hyper::client::connect::http] connected to Some(13.79.172.43:443) Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 2 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:19 tracey-1 iotedged[4618]: 2021-08-23T08:41:19Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [hyper::proto::h1::io] flushed 286 bytes Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:20 tracey-1 iotedged[4618]: 2021-08-23T08:41:20Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [hyper::proto::h1::io] read 406 bytes Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [hyper::proto::h1::io] parsed 6 headers Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [hyper::proto::h1::conn] incoming body is content-length (152 bytes) Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [hyper::proto::h1::conn] incoming body completed Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [hyper::client::pool] pooling idle connection for "https://csp-autolog-iot.azure-devices.net" Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [WARN] - Error in watchdog when checking for edge runtime status: Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [WARN] - A module runtime error occurred. Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [WARN] - caused by: Could not get identity $edgeAgent Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [WARN] - caused by: Could not get module $edgeAgent Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [WARN] - caused by: HTTP request failed: [401 Unauthorized] {"Message":"ErrorCode:IotHubNotFound;Unauthorized","ExceptionMessage":"Tracking I D:22e258c9c54145a089184d9a0af7ffb6-G:17-TimeStamp:08/23/2021 08:41:20"} Aug 23 08:41:21 tracey-1 iotedged[4618]: 2021-08-23T08:41:21Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s ```
tracey-2 starting correctly ``` Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [INFO] - Creating and starting edge runtime module edgeAgent Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [edgelet_http::client] Success generating token for request GET /devices/tracey-1/modules/$edgeAgent Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::client::connect::dns] resolving host="csp-autolog-iot.azure-devices.net" Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::client::connect::http] connecting to 20.50.65.131:443 Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::client::connect::http] connected to Some(20.50.65.131:443) Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 2 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::io] flushed 288 bytes Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::io] read 745 bytes Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::io] parsed 6 headers Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::conn] incoming body is content-length (521 bytes) Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::conn] incoming body completed Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::client::pool] pooling idle connection for "https://csp-autolog-iot.azure-devices.net" Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [INFO] - Updating identity for module $edgeAgent Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [edgelet_http::client] Success generating token for request PUT /devices/tracey-1/modules/$edgeAgent Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::client::pool] reuse idle connection for "https://csp-autolog-iot.azure-devices.net" Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::io] flushed 572 bytes Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::io] read 745 bytes Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::io] parsed 6 headers Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::conn] incoming body is content-length (521 bytes) Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::proto::h1::conn] incoming body completed Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [hyper::client::pool] pooling idle connection for "https://csp-autolog-iot.azure-devices.net" Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 1 events, 0.000s Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [INFO] - Pulling image mcr.microsoft.com/azureiotedge-agent:1.1... Aug 23 10:28:22 tracey-2 iotedged[24131]: 2021-08-23T10:28:22Z [DBUG] - [tokio_reactor] loop process - 0 events, 0.000s ```
kimhanse commented 3 years ago

Hi @pmzara, did you see my response above?

pmzara commented 3 years ago

@kimhanse trace-1 failing is from the in the field device? trace-2 correct is from in the office device? If yes, how do the devices from the field connect with IoT Hub? VPN, Proxy, etc. Could you check if in both situations the name resolution (DNS) to IoT Hub is returning the same IP?

kimhanse commented 3 years ago

@pmzara I think you are right in suspecting the DNS:

DNS lookup from tracey-1 (the failing field device):

CSP-Autolog-IoT.azure-devices.net is an alias for ihsu-prod-db-003.cloudapp.net.
ihsu-prod-db-003.cloudapp.net has address 13.79.172.43

DNS lookup from tracey-2 (working device in the office):

CSP-Autolog-IoT.azure-devices.net is an alias for ihsu-northeurope-4.northeurope.cloudapp.azure.com.
ihsu-northeurope-4.northeurope.cloudapp.azure.com has address 20.50.65.131

And when I look at the other devices the working one all resolve the address to 20.50.65.131 while the broken ones resolve to 13.79.172.43.

The internet connection in the field is using satellites, we are not using VPN or proxy. There is a firewall but it should be open enough for Iot Hub to work.

kimhanse commented 3 years ago

An other point that suggests that the problem is the DNS is that if I insert the following list in /etc/hosts the device starts working:

20.50.65.131  CSP-Autolog-IoT.azure-devices.net
We are using 8.8.8.8 for DNS and are getting the following result ``` root@tracey-1:~# dig -4 -t A CSP-Autolog-IoT.azure-devices.net @8.8.8.8 ; <<>> DiG 9.11.5-P4-5.1+deb10u5-Raspbian <<>> -4 -t A CSP-Autolog-IoT.azure-devices.net @8.8.8.8 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60056 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;CSP-Autolog-IoT.azure-devices.net. IN A ;; ANSWER SECTION: CSP-Autolog-IoT.azure-devices.net. 13 IN CNAME ihsu-prod-db-003.cloudapp.net. ihsu-prod-db-003.cloudapp.net. 59 IN A 13.79.172.43 ;; Query time: 0 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Wed Aug 25 12:01:56 UTC 2021 ;; MSG SIZE rcvd: 107 root@tracey-1:~# ```
veyalla commented 3 years ago

@kimhanse - Thanks for the update, do you need any more assistance with this issue?

kimhanse commented 3 years ago

@veyalla No, I don't need more help with my local problem.

But the error reporting here is non-existent and that caused me to spend days debugging a pretty simple problem. It should be possible for the security manager to realize that the connection is not working and report that in the log with a clear error message.

veyalla commented 3 years ago

This seems like a networking issue, DNS is resolving to stale IP addresses on some systems. I'm not sure how we can detect this in the application layer.

I presume 'iotedge check' was showing connectivity check errors on the broken systems?

kimhanse commented 3 years ago

@veyalla 'iotedge check' did not show any connectivity errors, see the details at the top of the story. It only returned 3 errors and they were all related to the fact that edgeAgent and edgeHub were not running.

It should be easy to detect that you don't get in touch with IoT Hub in Azure.

github-actions[bot] commented 3 years ago

This issue is being marked as stale because it has been open for 30 days with no activity.

micahl commented 2 years ago

Renaming this issue to "iotedge check failed to report connectivity errors"