Azure / iotedge

The IoT Edge OSS project
MIT License
1.46k stars 461 forks source link

aziot-edged 1.2.0 Offline start fails #4964

Closed pierrickcurt closed 3 years ago

pierrickcurt commented 3 years ago

Expected Behavior

After successful provisioning and image pulling, we would like to be able of starting in offline mode (without network available )

Current Behaviour

The aziot-edged is not starting we can see the error:

[WARN] - The daemon could not start up successfully: Could not retrieve device information
[WARN] -         caused by: HTTP response error: [400 Bad Request] {"message":"device identity not found"}

If the network is back After restarting aziot-identityd.servicethen aziot-edged.service, the agent start is ok

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Provision a raspberry with method DPS symmetric key
  2. Start the iotedge stack, deploy a manifest with a edgeAgent and a edgHub (1.2.0) and wait for the image pulling
  3. Remove ethernet wire or disable wifi
  4. systemctl restart aziot-identityd.service
  5. systemctl restart aziot-edged.service

Context (Environment)

I am working on porting the iotedge stack version 1.2.0 on Yocto (Dunfell and Thud version).

I have reproduce the behaviour on raspberry 3 There device is provisioned with DPS and symmetric key method

Output of iotedge check

Click here ``` Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK √ aziot-identity-service package is up-to-date - OK √ host time is close to reference time - OK √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- ‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. √ host can connect to and perform TLS handshake with DPS endpoint - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK √ configuration has correct URIs for daemon mgmt endpoint - OK × aziot-edge package is up-to-date - Error Could not spawn aziot-edged process √ container time is close to host time - OK ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. √ IPv6 network configuration - OK ‼ production readiness: container engine - Warning Device is not using a production-supported container engine (moby-engine). Please see https://aka.ms/iotedge-prod-checklist-moby for details. ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. √ production readiness: Edge Agent's storage directory is persisted on the host filesystem - OK √ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK √ Agent image is valid and can be pulled from upstream - OK Connectivity checks ------------------- 25 check(s) succeeded. 6 check(s) raised warnings. Re-run with --verbose for more details. 1 check(s) raised errors. Re-run with --verbose for more details. 6 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details. ```

Device Information

Runtime Versions

Logs

aziot-edged logs ``` May 10 08:55:22 phycore-stm32mp1-peach systemd[1]: Started Azure IoT Edge daemon. May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Starting Azure IoT Edge Module Runtime May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Version - 1.2.0 May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Initializing the module runtime... May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Initializing module runtime... May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Using runtime network id azure-iot-edge May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Successfully initialized module runtime May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Finished initializing the module runtime. May 10 08:55:22 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:22Z [INFO] - Obtaining edge device provisioning data... May 10 08:55:23 phycore-stm32mp1-peach aziot-edged[842]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185m2021-05-10T08:55:23Z [WARN] - The daemon could not start up successfully: Could not retrieve device informat ion[[0m May 10 08:55:23 phycore-stm32mp1-peach aziot-edged[842]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185m2021-05-10T08:55:23Z [WARN] - caused by: HTTP response error: [400 Bad Request] {"message":"device i dentity not found"}[[0m May 10 08:55:28 phycore-stm32mp1-peach aziot-edged[842]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185m2021-05-10T08:55:28Z [WARN] - Retrying getting edge device provisioning information.[[0m May 10 08:55:28 phycore-stm32mp1-peach aziot-edged[842]: 2021-05-10T08:55:28Z [INFO] - Obtaining edge device provisioning data... May 10 08:55:28 phycore-stm32mp1-peach aziot-edged[842]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185m2021-05-10T08:55:28Z [WARN] - The daemon could not start up successfully: Could not retrieve device informat ion[[0m ```
aziot-identityd.service logs ``` May 10 08:55:09 phycore-stm32mp1-peach systemd[1]: Started Azure IoT Identity Service. May 10 08:55:10 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:10Z [INFO] - Starting service... May 10 08:55:10 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:10Z [INFO] - Version - dev build May 10 08:55:10 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:10Z [INFO] - Provisioning starting. Reason: Startup May 10 08:55:10 phycore-stm32mp1-peach aziot-identityd[394]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185m2021-05-10T08:55:10Z [WARN] - Failed to reprovision device. Running offline. Reprovisioning failure reas on: DPS client error.[[0m May 10 08:55:10 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:10Z [INFO] - Starting server... May 10 08:55:22 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:22Z [INFO] - <-- POST /identities/device?api-version=2020-09-01 {"content-type": "application/json", "host": "2f72756e2f617a69 6f742f6964656e74697479642e736f636b:0", "content-length": "16"} May 10 08:55:23 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:23Z [INFO] - !!! device identity not found May 10 08:55:23 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:23Z [INFO] - --> 400 {"content-type": "application/json"} May 10 08:55:28 phycore-stm32mp1-peach aziot-identityd[394]: 2021-05-10T08:55:28Z [INFO] - <-- POST /identities/device?api-version=2020-09-01 {"content-type": "application/json", "host": "2f72756e2f617a69 6f742f6964656e74697479642e736f636b:0", "content-length": "16"} ```
dylanbronson commented 3 years ago

Hello, I was able to repro your issue. I am now looking into why this is happening, if it's an expected scenario, or if it's a potential bug.

dylanbronson commented 3 years ago

I have logged a bug, as this scenario should be working. Thank you for bringing it to our attention! I will update you with when we plan to fix it

dylanbronson commented 3 years ago

Just to update you, we've successfully made and tested a fix for this. No ETA on the release for it, yet, though

pierrickcurt commented 3 years ago

ok thanks for the update

github-actions[bot] commented 3 years ago

This issue is being marked as stale because it has been open for 30 days with no activity.

micahl commented 3 years ago

@pierrickcurt use the 1.2.3 release. Closing this issue as it should have been fixed with https://github.com/Azure/iot-identity-service/commit/c5ceaad492ef09014c328ac1a6bbd1a0d765b82e.