Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 457 forks source link

IoT device works perfectly for days, then it does not - reboot fixes issue #7245

Closed codeputer closed 1 month ago

codeputer commented 3 months ago

I'm looking for help diagnosing this issue when something goes wrong with my IOT device and requires a reboot. After the reboot, it seems the logs reset, and I lose the device's status state.

In my case, I have EdgeAgent and EdgeHub running, and my custom module is deployed (call it XXXConnectDeployed). I'm running Bookworm on a Raspberry PI Zero 2, running X64 Linux. My application is in Net8, and I have several timers running, executing code using various timespans (I read from various sensors using different cadences). I believe I have created thread-safe code to ensure there are no locks occurring.

Everything is working perfectly until it stops. Perhaps it's a memory leak issue? It will run for several days, and then it stops.

When it becomes unresponsive, the only thing I have is the IO Activity light (green light on the board), which indicates I/O activity. At this point, it will NOT allow me to SSH in (using an IP address or the hostname). I log in using a private key.

I don't know how to diagnose this issue without access to the logs just before the device dies. Rebooting seems to clear the logs historically. I have logs rollover to ensure disk space is not an issue. I could add logging messages for memory consumption, for example, but I can't find a way to see those after the device dies. I feel I need to solve how to access historical logs first before working on adding more info to the logs.

Looking for an approach, or any documentation, on how I can approach this problem.

codeputer commented 3 months ago

I know this is vague in terms of an issue, but I'm looking for an approach on how to give me more information as to why its stops, when it stops.

david-emakenemi commented 3 months ago

Hey @codeputer can you open an Azure support ticket and provide the support bundle in the support case so we can take a look at it?

ryanwinter commented 1 month ago

Closing due to no activity. @codeputer, please reopen if this is still outstanding for you.