Closed MattCosturos closed 12 months ago
@MattCosturos from the logs it looks like what happens is there are connection issues and because of that, reauth cannot happen. Just to confirm, are you saying that it never recovers unless you restart edgeHub
module / run iotedge check
or that it does eventually restore on its own?
In fact, if it never recovers on its own, would you mind getting debug level logs if possible?
The iotedge check
restoring connectivity is super weird and not expected... Would also be curious if any network connectivity works on the device when you see this problem, since you say you can restart EH I assume you can get to the device. So basically, if you see it happening, don't run iotedge check
verify with curl
or something that network still is okay and then see if EH recovers on its own.
Hello @nyanzebra
Now this is getting weird. Device was in a disconnected state (has been disconnected for ~14 hours).
I attempted to
curl https://packages.microsoft.com/ubuntu/23.10/prod/pool/main/a/aspnetcore-runtime-6.0/aspnetcore-runtime-6.0_6.0.25-1_amd64.deb --output asp-runtime
It just hung (0 progress on the download) I tried several times.
Then I started logging packets with tcpdump
and attempted to curl again. It suddenly worked, and edgeHub
recovered on its own. I did nothing to restart iotedge daemons, or the system modules.
Everything I am debugging points to my network or network configuration being the cause of the issue. I am going to close this issue, I just wanted to get someone to look at the logs, and make sure there wasn't some edge runtime issue. But I am pretty sure every error in the logs stems from the initial connection issue.
@MattCosturos sounds good, feel free to reopen anytime if have further questions or issues :)
Expected Behavior
Modules should remain running and connected to the iot hub
Current Behaviora
During the periodic task to reauthenticate connected clients, the connection fails
Steps to Reproduce
Unable to reproduce on demand. Failures are seemingly random. Sometimes it will fail on the first renewal (after 1 hour) sometimes it will fail on the 2nd renewal (after 2), etc
Context (Environment)
Output of
iotedge check
Click here
``` Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK √ aziot-identity-service package is up-to-date - OK ‼ host time is close to reference time - Warning Could not query NTP server caused by: Could not query NTP server caused by: could not receive NTP server response: Resource temporarily unavailable (os error 11) caused by: Resource temporarily unavailable (os error 11) √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ check all EST server URLs utilize HTTPS - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- √ host can connect to and perform TLS handshake with iothub AMQP port - OK √ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with iothub MQTT port - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK √ configuration has correct URIs for daemon mgmt endpoint - OK √ aziot-edge package is up-to-date - OK √ container time is close to host time - OK ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. caused by: Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. caused by: Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. √ production readiness: Edge Agent's storage directory is persisted on the host filesystem - OK √ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK √ Agent image is valid and can be pulled from upstream - OK √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- √ container on the default network can connect to upstream AMQP port - OK √ container on the default network can connect to upstream HTTPS / WebSockets port - OK √ container on the default network can connect to upstream MQTT port - OK skipping because of not required in this configuration √ container on the IoT Edge module network can connect to upstream AMQP port - OK √ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK √ container on the IoT Edge module network can connect to upstream MQTT port - OK skipping because of not required in this configuration 32 check(s) succeeded. 3 check(s) raised warnings. 2 check(s) were skipped due to errors from other checks. ```Device Information
Runtime Versions
iotedge version
]: iotedge 1.4.20docker version
]:docker version
``` Client: Version: 20.10.18+azure-1 API version: 1.41 Go version: go1.18.6 Git commit: b40c2f6b5deeb11ac6c485c940865ee40664f0f0 Built: Thu Sep 8 08:19:02 UTC 2022 OS/Arch: linux/amd64 Context: default Experimental: true Server: Engine: Version: 24.0.7-1 API version: 1.43 (minimum version 1.12) Go version: go1.20.10 Git commit: 311b9ff0aa93aa55880e1e5f8871c4fb69583426 Built: Thu Oct 26 07:51:05 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.5.13+azure-1 GitCommit: a17ec496a95e55601607ca50828147e8ccaeebf1 runc: Version: 1.1.4 GitCommit: 5fd4c4d144137e991c4acebb2146ab1483a97925 docker-init: Version: 0.19.0 GitCommit: de40ad0 ```Please see edgeHub.txt for the edge hub logs showing all the connection failures.
Recovery happens when restarting the
edgeHub
module OR randomly, but possibly when I runiotedge check
( Perhaps the iot hub connection check restores the network connection for edgeHub? )edgeAgent.txt azedged.txt azidentity.txt edgeHub.txt