WildernessLabs / Meadow_Issues

Public repo for bugs and issues with Meadow
15 stars 0 forks source link

Some errors (deadlocks/OOMs) do not seem to be generated/reported to Meadow.Cloud #782

Open duduita opened 3 days ago

duduita commented 3 days ago

Describe the bug Some errors (deadlocks/OOMs) do not seem to be reported to Meadow.Cloud. Actually, it looks like the mono_error.txt is not even being generated.

To Reproduce Steps to reproduce an OOM (out-of-memory error):

  1. Call the AllocateMemory() method during the app initialization:

        static void AllocateMemory()
        {
            List<byte[]> allocations = new List<byte[]>(); // Store the allocated memory
            int iteration = 0;
    
            // Loop to keep allocating 3MB chunks
            while (true)
            {
                byte[] memoryBlock = new byte[3 * 1024 * 1024]; // Allocate 3MB
                allocations.Add(memoryBlock); // Keep reference to prevent GC from collecting
                iteration++;
    
                Console.WriteLine($"Iteration: {iteration}, Allocated: {allocations.Count * 3} MB");
    
                // Sleep a little to simulate some delay between allocations
                Thread.Sleep(100);
            }
        }
  2. After the app restarts, check meadow files using the meadow file list CLI command to see whether there isn't a mono_error.txt or not.

Expected behavior The mono_error.txt should have been created, as well as sent to the Meadow.Cloud.

Meadow (please complete the following information as best as you can): Board Information Model: F7Micro Hardware version: F7CoreComputeV2 Device name: CellBasics

Hardware Information Processor type: STM32F777IIK6 ID: 3A-00-21-00-0D-50-4B-55-30-38-31-20 Serial number: 20523874554B Coprocessor type: ESP32 MAC Address - WiFi: 4C:75:25:D5:78:A0

Firmware Versions OS: 1.14.0.0 Mono: 1.14.0.0 Coprocessor: 1.14.0.0 Protocol: 7

NevynUK commented 2 days ago

I am not sure that the code above actually represents an error in the OS. Mono does actually generate an exception which the application can catch.

Doesn't Core take on responsibility for unhandled exceptions if you have the right configuration settings in app.config.yaml?

duduita commented 1 day ago

I'm assuming that it represents an error in the OS given that the induce_reset() was called, and the device rebooted, but I can be wrong @NevynUK.

NevynUK commented 1 day ago

I think I tried your code in a try/catch block and the device did not reset, it just looped. So I did this:

List<byte[]> allocations = new List<byte[]>();
int iteration = 0;

while (true)
{
    try
    {
        byte[] memoryBlock = new byte[3 * 1024 * 1024];
        allocations.Add(memoryBlock);
        iteration++;

        Console.WriteLine($"Iteration: {iteration}, Allocated: {allocations.Count * 3} MB");

        Thread.Sleep(100);
    }
    catch (Exception e)
    {
        Console.WriteLine($"OutOfMemory failed: {e.Message}");
    }

This suggests that the application is still running and any reset is due to Core detecting an unhandled exception and rebooting the board. If this is the case then I would be expecting Core to detect the unhandled exception and generating an error report file not the OS.

duduita commented 1 day ago

I think that my reproduction sample is not the best one, since usually we have OOMs at the OS level, so we can't use try/catches there. I'll find another way to reproduce this issue, by getting an OOM using only OS calls, and I'll think more about it. But, at first glance, if we have an OOM at the OS level, the OS should generate an error report file, right?