Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 457 forks source link

edgeAgent automatically stopping all modules #7227

Closed JelleCaptic closed 2 weeks ago

JelleCaptic commented 4 months ago

Expected Behavior

I would assume modules can stay up without a certain time limit or auto-shutdown mechanism. On top of that, I would expect certain reasoning for automatically stopping modules.

Current Behavior

At a random timestamp the edgeAgent stopped all modules and executed a stop command for all my running modules

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

Output of iotedge check

All iotedge checks are ok.

Device Information

Runtime Versions

Server: Engine: Version: 23.0.7+azure-1 API version: 1.42 (minimum version 1.12) Go version: go1.19.12 Git commit: 606a7a76949ec08cb26940037b1c71c0024c6b44 Built: Wed Sep 6 06:38:37 2023 OS/Arch: linux/arm64 Experimental: false containerd: Version: 1.6.24-2 GitCommit: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523 runc: Version: 1.1.9-1 GitCommit: ccaecfcbc907d70a7aa870a6650887b901b25b82 docker-init: Version: 0.19.0 GitCommit:


## Logs
<!--
Please share as many logs as possible. This will help debugging
Follow [diagnostic steps](https://docs.microsoft.com/en-us/azure/iot-edge/troubleshoot#standard-diagnostic-steps) to help extract useful information.
Don't forget to remove any connection string information!
-->

<details>
<summary>edge-agent logs</summary>

{"log":"\u003c6\u003e 2024-03-04 07:09:38.280 +00:00 [INF] - Starting periodic operation refresh twin config...\n","stream":"stdout","time":"2024-03-04T07:09:38.280722368Z"} {"log":"\u003c6\u003e 2024-03-04 07:09:38.303 +00:00 [INF] - Obtained Edge agent twin from IoTHub with desired properties version 108 and reported properties version 871.\n","stream":"stdout","time":"2024-03-04T07:09:38.303908896Z"} {"log":"\u003c6\u003e 2024-03-04 07:09:38.314 +00:00 [INF] - Successfully completed periodic operation refresh twin config\n","stream":"stdout","time":"2024-03-04T07:09:38.31436288Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.122 +00:00 [INF] - Termination requested, initiating shutdown.\n","stream":"stdout","time":"2024-03-04T07:13:25.123151872Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.123 +00:00 [INF] - Main thread terminated\n","stream":"stdout","time":"2024-03-04T07:13:25.123461504Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.131 +00:00 [INF] - Initiating shutdown cleanup.\n","stream":"stdout","time":"2024-03-04T07:13:25.141976576Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.133 +00:00 [INF] - Stopping all modules...\n","stream":"stdout","time":"2024-03-04T07:13:25.142001824Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.168 +00:00 [INF] - Waiting for cleanup to finish\n","stream":"stdout","time":"2024-03-04T07:13:25.1692384Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.270 +00:00 [INF] - Plan execution started for deployment -1\n","stream":"stdout","time":"2024-03-04T07:13:25.273090144Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.290 +00:00 [INF] - Executing command: \"Stop module db\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291056896Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module edge-data-manager\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291107968Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module api\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291170176Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module edge-api-frontend\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291252768Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.291 +00:00 [INF] - Executing command: \"Stop module edge-frontend\"\n","stream":"stdout","time":"2024-03-04T07:13:25.291321536Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.302 +00:00 [INF] - Updated reported properties\n","stream":"stdout","time":"2024-03-04T07:13:25.302620832Z"} {"log":"2024-03-04 07:13:40 Starting Edge Agent\n","stream":"stdout","time":"2024-03-04T07:13:40.783627552Z"} {"log":"2024-03-04 07:13:40 Changing ownership of storage folder: /tmp/edgeAgent to 13622\n","stream":"stdout","time":"2024-03-04T07:13:40.811136352Z"} {"log":"2024-03-04 07:13:40 Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622\n","stream":"stdout","time":"2024-03-04T07:13:40.817397472Z"} {"log":"2024-03-04 07:13:40 Changing ownership of management socket: /var/run/iotedge/mgmt.sock\n","stream":"stdout","time":"2024-03-04T07:13:40.831135328Z"} {"log":"2024-03-04 07:13:40 Completed necessary setup. Starting Edge Agent.\n","stream":"stdout","time":"2024-03-04T07:13:40.851940512Z"} {"log":"2024-03-04 07:13:40.896 +00:00 Edge Agent Main()\n","stream":"stdout","time":"2024-03-04T07:13:40.907348672Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:41.206 +00:00 [INF] - Initializing Edge Agent.\n","stream":"stdout","time":"2024-03-04T07:13:41.232074208Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:41.417 +00:00 [INF] - Version - 1.4.29.85541525 (f049017a5072f85aec250e945ebb2ca52e4af3f4)\n","stream":"stdout","time":"2024-03-04T07:13:41.417553152Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:41.418 +00:00 [INF] - \n","stream":"stdout","time":"2024-03-04T07:13:41.418499744Z"}

</details>

<details>
<summary>edge-hub logs</summary>

{"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" received FIN.\n","stream":"stdout","time":"2024-03-04T07:13:23.775449856Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\" disconnecting.\n","stream":"stdout","time":"2024-03-04T07:13:23.775573856Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\" stopped.\n","stream":"stdout","time":"2024-03-04T07:13:23.775600288Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:23.775 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" sending FIN because: \"\"The Socket transport's send loop completed gracefully.\"\"\n","stream":"stdout","time":"2024-03-04T07:13:23.775730112Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.127 +00:00 [INF] [EdgeHub] - Termination requested, initiating shutdown.\n","stream":"stdout","time":"2024-03-04T07:13:25.129527008Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.128 +00:00 [INF] [EdgeHub] - Stopping the protocol heads...\n","stream":"stdout","time":"2024-03-04T07:13:25.12964Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.130 +00:00 [INF] [EdgeHub] - Closing protocol heads - (MQTT, HTTP)\n","stream":"stdout","time":"2024-03-04T07:13:25.132696384Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.136 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MqttProtocolHead] - Stopping MQTT protocol head\n","stream":"stdout","time":"2024-03-04T07:13:25.137041024Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.141 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Http.HttpProtocolHead] - Closing HTTP head\n","stream":"stdout","time":"2024-03-04T07:13:25.14174832Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:25.143 +00:00 [DBG] [Microsoft.AspNetCore.Hosting.Diagnostics] - Hosting shutdown\n","stream":"stdout","time":"2024-03-04T07:13:25.1432352Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.177 +00:00 [INF] [EdgeHub] - Waiting for cleanup to finish\n","stream":"stdout","time":"2024-03-04T07:13:25.178299328Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:25.256 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" received FIN.\n","stream":"stdout","time":"2024-03-04T07:13:25.257041696Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:25.259 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets] - Connection id \"\"ID\"\" sending FIN because: \"\"The Socket transport's send loop completed gracefully.\"\"\n","stream":"stdout","time":"2024-03-04T07:13:25.259576512Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:25.259 +00:00 [DBG] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\" disconnecting.\n","stream":"stdout","time":"2024-03-04T07:13:25.260128576Z"} {"log":"\u003c4\u003e 2024-03-04 07:13:25.260 +00:00 [WRN] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel] - Error reading transport. CorrelationId ID\n","stream":"stdout","time":"2024-03-04T07:13:25.264976352Z"} {"log":"System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake.\n","stream":"stdout","time":"2024-03-04T07:13:25.265037504Z"} {"log":" at System.Net.WebSockets.ManagedWebSocket.ThrowIfEOFUnexpected(Boolean throwOnPrematureClosure)\n","stream":"stdout","time":"2024-03-04T07:13:25.265044064Z"} {"log":" at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure)\n","stream":"stdout","time":"2024-03-04T07:13:25.265048352Z"} {"log":" at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)\n","stream":"stdout","time":"2024-03-04T07:13:25.265078624Z"} {"log":" at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TResult](Memory1 payloadBuffer, CancellationToken cancellationToken)\n","stream":"stdout","time":"2024-03-04T07:13:25.265084608Z"} {"log":" at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox1.System.Threading.Tasks.Sources.IValueTaskSource\u003cTResult\u003e.GetResult(Int16 token)\n","stream":"stdout","time":"2024-03-04T07:13:25.265088352Z"} {"log":" at System.Threading.Tasks.ValueTask1.ValueTaskSourceAsTask.\u003c\u003ec.\u003c.cctor\u003eb__4_0(Object state)\n","stream":"stdout","time":"2024-03-04T07:13:25.265092256Z"} {"log":"--- End of stack trace from previous location ---\n","stream":"stdout","time":"2024-03-04T07:13:25.265095712Z"} {"log":" at Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel.DoReadBytes(IByteBuffer byteBuffer) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Mqtt/ServerWebSocketChannel.cs:line 239\n","stream":"stdout","time":"2024-03-04T07:13:25.26509904Z"} {"log":" at Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel.DoBeginRead() in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Mqtt/ServerWebSocketChannel.cs:line 133\n","stream":"stdout","time":"2024-03-04T07:13:25.265104608Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.265 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.ServerWebSocketChannel] - Transport aborted. CorrelationId ID\n","stream":"stdout","time":"2024-03-04T07:13:25.26565424Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.266 +00:00 [INF] [Microsoft.AspNetCore.Server.Kestrel.Connections] - Connection id \"\"ID\"\", Request id \"\"ID:00000002\"\": the application aborted the connection.\n","stream":"stdout","time":"2024-03-04T07:13:25.266200416Z"} {"log":"\u003c7\u003e 2024-03-04 07:13:25.266 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Http.Middleware.WebSocketHandlingMiddleware] - Request ID:00000002 completed. CorrelationId ID\n","stream":"stdout","time":"2024-03-04T07:13:25.269377952Z"} {"log":"\u003c6\u003e 2024-03-04 07:13:25.266 +00:00 [INF] [Microsoft.AspNetCore.Hosting.Diagnostics] - \"Request finished HTTP/1.1 GET https://gsd7449186/$iothub/websocket - - - 101 - - 956745.0866ms\"\n","stream":"stdout","time":"2024-03-04T07:13:25.269454272Z"}


</details>

## Additional Information
All the modules came back up after a couple of minutes automatically. 

What I would like to get more info about:
- What triggers the automatic shutdown of all modules because this needs to be tackled. 
- Is there a way to prevent this?
david-emakenemi commented 3 months ago

@bilalsellak can you take a look here?

ryanwinter commented 1 month ago

@JelleCaptic : Have you seen any further incidents like this in the past 2 months?

JelleCaptic commented 1 month ago

@ryanwinter No, the last 2 months were stable.

bishal41 commented 2 weeks ago

@JelleCaptic closing this issue based on your last comment and assuming that you haven't seen any issue since. Please feel free to open a new issue if needed