Open jangernert opened 2 weeks ago
If you suspect that some SKSurface is being finalized, you can try collecting GC and finalization events using dotnet-trace https://github.com/dotnet/diagnostics/blob/main/documentation/dotnet-trace-instructions.md#commonly-used-keywords-for-the-microsoft-windows-dotnetruntime-provider
My guess is that you are using some skia primitives to render UI (like SkPaint or SkShader), and forget to dispose some instances. If there is a lot of code you can collect memory dump during this crash (with procdump for example) and you will probably see one more thread that calls Dispose on this instance (from destructor), so you will be able to inspect this object and guess where it was created.
Describe the bug
A few Systems running a 24/7 Kiosk UI crashing randomly with the following Exception:
To Reproduce
I've been looking at crash statistics and trying to reproduce the issue for quite some time. But I have found no clear pattern so far. I haven't even managed to reproduce the crash once on my development machine (which runs Linux).
Some users seem to be affected quite heavily. While others encounter the crash rarely or not at all. All while on very similar hardware.
"Affected heavily" in this case means once or twice a day. So not often enough to do some iterative testing with debug builds.
Expected behavior
Not crashing
Avalonia version
11.1.4
OS
Windows
Additional context
A relevant issue from the SkiaSharp tracker: https://github.com/mono/SkiaSharp/issues/2125
It suggests the crash could be caused by not disposing the SKSurface on time after drawing. The avalonia rendering code with all its abstractions is not straight forward to follow. So I'm not sure if this is actually the issue here.
The issue started to appear after a big refactor and visual redesign of the UI code. So maybe how avalonia is used is part of the equation. But without being able to consistently trigger the crash this also just a guess.
I haven't had the time to update to avalonia 11.2, yet. But once I did and the update is deployed for a while I can provide an update if 11.2 fixed the issue.