Closed eanavalentin closed 2 months ago
We are also getting no modules starting when rebooting our devices with the network cable disconnected, or restarting VM's in a test environment with the NIC disabled. Is this expected behavior? It doesn't seem correct.
After more testing, it seems the issue is caused by missing or invalid backup.json. The file contains a base64 encoded string, which makes no sense decoded. However, the edgeAgent sometime is able to use it to start correctly all modules, even when no connection, other times the behavior initially reported is observed.
@eanavalentin thanks for sharing the info and glad you were able to get the issue sorted out. Do you have any further question or shall we close this?
@eanavalentin closing this issue, please re-open or let us know if you have any further questions
We have a couple hundred devices (odroid c4) that have a very bad and unreliable connection. This means up to sometime up to 80% packet loss and up to 2.5 seconds ping reply. The devices are provisioned through DPS with a symmetric key, while on a reliable network. Once everything is up and running, they are switched to the unreliable network.
During normal operation, for reasons out of our control, the devices might reboot. When they come back online, the edgeAgent attempts to connect to IoT Hub, but while doing this, it does not start any of the modules.
The device remains non functional for this time, since our modules are never started.
Sometimes, after starting, we see it trying to pull the containers again from the container registry, and throw timeout errors because of the slow connection. It is to our understanding that no containers should be pulled once created and no new deployment is received.
Additionally, on every restart, the module containers are stopped/removed/recreated. Should this happen? Shouldn't the containers, once created, remain the same, unless a new deployment is received.
Furthermore, the edgeAgent keeps logging
Empty edge agent config was received. Attempting to read config from backup (/tmp/edgeAgent/edgeAgent/backup.json) instead
every 10 seconds or so. There is no /tmp/edgeAgent/edgeAgent/backup.json file, even though the path is persisted, mounted inside the container, and the StorageFolder is set correctly.Expected Behavior
The edgeAgent should start all modules regardless if it is able to connect to the cloud or not. Containers should not be recreated after a reboot.
Current Behavior
The edgeAgent fails to start the containers for a very long time until the connection get a little bit better, or maybe after some restarts. All containers, besides the edgeAgent container are recreated (docker ps -a returns different Container Id).
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
Context (Environment)
Output of
iotedge check
Click here
``` Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK ‼ aziot-identity-service package is up-to-date - Warning Installed aziot-identity-service package has version 1.4.7 but 1.4.8 is the latest stable version available. Please see https://aka.ms/aziot-update-runtime for update instructions. √ host time is close to reference time - OK √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ check all EST server URLs utilize HTTPS - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- √ host can connect to and perform TLS handshake with iothub AMQP port - OK √ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with iothub MQTT port - OK √ host can connect to and perform TLS handshake with DPS endpoint - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK √ configuration has correct URIs for daemon mgmt endpoint - OK ‼ aziot-edge package is up-to-date - Warning Installed IoT Edge daemon has version 1.4.27 but 1.4.33 is the latest stable version available. Please see https://aka.ms/iotedge-update-runtime for update instructions. √ container time is close to host time - OK √ DNS server - OK √ production readiness: logs policy - OK √ production readiness: Edge Agent's storage directory is persisted on the host filesystem - OK √ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK √ Agent image is valid and can be pulled from upstream - OK √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- √ container on the default network can connect to upstream AMQP port - OK √ container on the default network can connect to upstream HTTPS / WebSockets port - OK √ container on the IoT Edge module network can connect to upstream AMQP port - OK √ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK 34 check(s) succeeded. 2 check(s) raised warnings. Re-run with --verbose for more details. 2 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details. ```Device Information
Runtime Versions
Note: when using Windows containers on Windows, run
docker -H npipe:////./pipe/iotedge_moby_engine version
insteadLogs
aziot-edged logs
``` [edged.txt](https://github.com/user-attachments/files/16026538/edged.txt) ```edge-agent logs
``` [edgeAgent.txt](https://github.com/user-attachments/files/16026542/edgeAgent.txt) ```edge-hub logs
``` edgeHub is not starting ```Additional Information
We understand from here that the containers should not be recreated after a restart.