Azure / iotedge

The IoT Edge OSS project
MIT License
1.46k stars 460 forks source link

Possible regression in ARM64 mcr.microsoft.com/azureiotedge-diagnostics TAG:1.2.7 IMAGE ID:4395063a43dd CREATED:2 weeks ago SIZE:177MB #6106

Closed DavidGrob-MS closed 2 years ago

DavidGrob-MS commented 2 years ago

Expected Behavior

Tell us what should happen sudo iotedge check --diagnostics-image-name mcr.microsoft.com/azureiotedge-diagnostics:1.2.6 --verbose output should match: sudo iotedge check --verbose

Current Behavior

Tell us what happens instead of the expected behavior 1.2.6 is all green with a few warnings 1.2.7 has the 8 errors listed below The issue does not repro on an AMD64 vm image only on our ARM64 device and iot edge and hub appear to work fine (direct method ping to edgeAgent from portal returns 200 Ok)

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Install iotedge on custom ubuntu 21.10 image
  2. sudo iotedge config mp --connection-string 'PASTE_DEVICE_CONNECTION_STRING_HERE'
  3. sudo iotedge config apply
  4. sudo iotedge system status
  5. sudo iotedge check --verbose
  6. sudo iotedge check --diagnostics-image-name mcr.microsoft.com/azureiotedge-diagnostics:1.2.6 --verbose

Context (Environment)

custom ubuntu 21.10 ARM64 image running on NVIDA jetson xavier

Output of iotedge check

Click here ``` root@percept-devtest:~# sudo iotedge check --diagnostics-image-name mcr.microsoft.com/azureiotedge-diagnostics:1.2.6 --verbose Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK √ aziot-identity-service package is up-to-date - OK √ host time is close to reference time - OK √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- √ host can connect to and perform TLS handshake with iothub AMQP port - OK √ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with iothub MQTT port - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK √ configuration has correct URIs for daemon mgmt endpoint - OK √ aziot-edge package is up-to-date - OK √ container time is close to host time - OK ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. caused by: Could not open container engine config file /etc/docker/daemon.json caused by: No such file or directory (os error 2) ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. caused by: Could not open container engine config file /etc/docker/daemon.json caused by: No such file or directory (os error 2) ‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. ‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. √ Agent image is valid and can be pulled from upstream - OK √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- √ container on the default network can connect to upstream AMQP port - OK √ container on the default network can connect to upstream HTTPS / WebSockets port - OK √ container on the default network can connect to upstream MQTT port - OK √ container on the IoT Edge module network can connect to upstream AMQP port - OK √ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK √ container on the IoT Edge module network can connect to upstream MQTT port - OK 32 check(s) succeeded. 4 check(s) raised warnings. root@percept-devtest:~# sudo iotedge check --verbose Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK √ aziot-identity-service package is up-to-date - OK √ host time is close to reference time - OK √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- √ host can connect to and perform TLS handshake with iothub AMQP port - OK √ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with iothub MQTT port - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK × configuration has correct URIs for daemon mgmt endpoint - Error caused by: docker returned exit code: 139, stderr = √ aziot-edge package is up-to-date - OK × container time is close to host time - Error Could not query local time inside container caused by: docker returned exit code: 139, stderr = ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. caused by: Could not open container engine config file /etc/docker/daemon.json caused by: No such file or directory (os error 2) ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. caused by: Could not open container engine config file /etc/docker/daemon.json caused by: No such file or directory (os error 2) ‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. ‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. √ Agent image is valid and can be pulled from upstream - OK √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- × container on the default network can connect to upstream AMQP port - Error Container on the default network could not connect to test-vaidk.azure-devices.net:5671 caused by: docker returned exit code: 139, stderr = × container on the default network can connect to upstream HTTPS / WebSockets port - Error Container on the default network could not connect to test-vaidk.azure-devices.net:443 caused by: docker returned exit code: 139, stderr = × container on the default network can connect to upstream MQTT port - Error Container on the default network could not connect to test-vaidk.azure-devices.net:8883 caused by: docker returned exit code: 139, stderr = × container on the IoT Edge module network can connect to upstream AMQP port - Error Container on the azure-iot-edge network could not connect to test-vaidk.azure-devices.net:5671 caused by: docker returned exit code: 139, stderr = × container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - Error Container on the azure-iot-edge network could not connect to test-vaidk.azure-devices.net:443 caused by: docker returned exit code: 139, stderr = × container on the IoT Edge module network can connect to upstream MQTT port - Error Container on the azure-iot-edge network could not connect to test-vaidk.azure-devices.net:8883 caused by: docker returned exit code: 139, stderr = 24 check(s) succeeded. 4 check(s) raised warnings. 8 check(s) raised errors. root@percept-devtest:~# ```

Device Information

Runtime Versions

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

support_bundle.zip

veyalla commented 2 years ago

I can repro this. It seems like the dotnet binary in mcr.microsoft.com/azureiotedge-diagnostics:1.2.7 ARM64 image is not the right architecture:

× configuration has correct URIs for daemon mgmt endpoint - Error
    Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.2.7' locally
    1.2.7: Pulling from azureiotedge-diagnostics
    db4feefed179: Pulling fs layer
    7c1d7a949c06: Pulling fs layer
    10d77364a69e: Pulling fs layer
    612bc974c136: Pulling fs layer
    ed017cdf655e: Pulling fs layer
    7a268ea7bdac: Pulling fs layer
    612bc974c136: Waiting
    7a268ea7bdac: Waiting
    ed017cdf655e: Waiting
    10d77364a69e: Verifying Checksum
    10d77364a69e: Download complete
    7c1d7a949c06: Verifying Checksum
    7c1d7a949c06: Download complete
    db4feefed179: Verifying Checksum
    db4feefed179: Download complete
    612bc974c136: Verifying Checksum
    612bc974c136: Download complete
    ed017cdf655e: Verifying Checksum
    ed017cdf655e: Download complete
    db4feefed179: Pull complete
    7c1d7a949c06: Pull complete
    10d77364a69e: Pull complete
    612bc974c136: Pull complete
    ed017cdf655e: Pull complete
    7a268ea7bdac: Verifying Checksum
    7a268ea7bdac: Download complete
    7a268ea7bdac: Pull complete
    Digest: sha256:f47449496054ffed2bb1f6dfdb1c5f9c3e8da8714281521c8cf4f325b006a643
    Status: Downloaded newer image for mcr.microsoft.com/azureiotedge-diagnostics:1.2.7
    exec /usr/bin/dotnet: exec format error
        caused by: docker returned exit code: 1, stderr = Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.2.7' locally
                   1.2.7: Pulling from azureiotedge-diagnostics
                   db4feefed179: Pulling fs layer
                   7c1d7a949c06: Pulling fs layer
                   10d77364a69e: Pulling fs layer
                   612bc974c136: Pulling fs layer
                   ed017cdf655e: Pulling fs layer
                   7a268ea7bdac: Pulling fs layer
                   612bc974c136: Waiting
                   7a268ea7bdac: Waiting
                   ed017cdf655e: Waiting
                   10d77364a69e: Verifying Checksum
                   10d77364a69e: Download complete
                   7c1d7a949c06: Verifying Checksum
                   7c1d7a949c06: Download complete
                   db4feefed179: Verifying Checksum
                   db4feefed179: Download complete
                   612bc974c136: Verifying Checksum
                   612bc974c136: Download complete
                   ed017cdf655e: Verifying Checksum
                   ed017cdf655e: Download complete
                   db4feefed179: Pull complete
                   7c1d7a949c06: Pull complete
                   10d77364a69e: Pull complete
                   612bc974c136: Pull complete
                   ed017cdf655e: Pull complete
                   7a268ea7bdac: Verifying Checksum
                   7a268ea7bdac: Download complete
                   7a268ea7bdac: Pull complete
                   Digest: sha256:f47449496054ffed2bb1f6dfdb1c5f9c3e8da8714281521c8cf4f325b006a643
                   Status: Downloaded newer image for mcr.microsoft.com/azureiotedge-diagnostics:1.2.7
                   exec /usr/bin/dotnet: exec format error
yophilav commented 2 years ago

@DavidGrob-MS Thank you for the report. We have addressed the issue Diagnostic Module issue in release 1.2.8. Please feel free to upgrade the module the latest version which should solve the issue.