Azure / iotedge

The IoT Edge OSS project
MIT License
1.46k stars 460 forks source link

edgeHub Fails to Start After Faulty Route Configuration #7394

Closed uriel-kluk closed 1 week ago

uriel-kluk commented 2 weeks ago

Description: After configuring an incorrect route for the MetricsCollector, generating the deployment manifest, and deploying it to my IoT Edge instance, the route did not function as expected. Attempted to remove the faulty route and the MetricsCollector module itself, but now edgeHub fails to start.

Error Details: In the console log, the following error is observed:

<3> 2024-11-12 14:34:43.709 +00:00 [ERR] - Stopping with exception
System.TimeoutException: Operation timed out
   at Microsoft.Azure.Devices.Edge.Util.TaskEx.<>c.<.cctor>b__27_0() in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 114

The module twin for edgeHub shows this additional error:

"lastDesiredStatus": {
    "code": 400,
    "description": "Error while parsing desired properties - Error parsing route MetricToIothub - Exception of type 'Microsoft.Azure.Devices.Routing.Core.Query.RouteCompilationException' was thrown."
}

Steps to Reproduce:

  1. Configure an incorrect route for the MetricsCollector in the deployment manifest.
  2. Deploy the manifest to an IoT Edge instance.
  3. Attempt to remove the faulty route and delete the MetricsCollector module.
  4. Observe that edgeHub fails to restart, with errors indicating a parsing failure and timeout exception.

Expected Behavior: After removing the faulty route and the MetricsCollector module, edgeHub should start without issues.

Actual Behavior: edgeHub fails to start due to a System.TimeoutException and parsing error related to the route configuration.

Possible Cause: The faulty route seems to prevent edgeHub from parsing desired properties, leading to a RouteCompilationException.

Suggested Fix: Investigate and ensure that route removal fully resets any faulty configuration in the edgeHub deployment manifest. Consider implementing additional validation or error handling for route configurations to prevent this issue.

Environment:


jlian commented 1 week ago

@bilalsellak (on-call) please see if you can repro?

bilalsellak commented 1 week ago

Issue does not reproduce

uriel-kluk commented 1 week ago

@bilalsellak I managed to delete everything under /var/lib/aziot/storage/edgeHub , then applied config and now it connects. It appears that my cache got corrupted.

We can close this issue as it was specific to my system.

Thanks for your help!