Open janhenhan opened 1 year ago
Thanks for the report!
This sounds like a nasty bug, I hope we can find the cause and fix it.
It kinda sounds like a use-after-free bug where the cmd
pointer is accessed after it has been freed somewhere else. However, it is freed literally in the next line, and not somewhere else ...
Smells a bit like undefined behavior ...
These bad_accesses happen somewhere in APF?
Well, yes, the CommandQueue is used to send messages from the control thread to the audio thread (and back). It might be a problem in the APF, but not necessarily.
Any other logs that would help?
I don't know. It seems the problem happens when calling the cleanup()
function, but before this function is actually executed.
I'm on a M1 Mac.
That's a good hint. I have the feeling that our ring buffer implementation might not be correct on ARM processors.
Are you running the SSR natively or via Rosetta?
The first thing I would try is to use atomics in our ring buffer and see if that changes anything. Currently, I don't have a lot of time, but maybe I can try a few things next week.
Thanks Matthias! It would be really great if you can find the time to have a look at some point :)
Are you running the SSR natively or via Rosetta? I'm running a native M1 arm build.
For what it is worth, a maybe questionable observation I have made is that SSR seems to crash much quicker when I start it as a subprocess in python compared to when I wait for it to crash in the debugger... But that may just be subjective or within the range of the very varying times it runs until it crashes.
Hi all,
Great to see you guys are still going strong developing SSR after over a decade. Congrats on the 0.6 release!
Recently, I've increased the number of network messages I send to ssr-brs (As an example, let's say 20 sources each get messages updating some of their attributes at 100 Hz update rate). Unfortunately that came with a big decrease of stability of the ssr.
I am experiencing some unexpected crashes after varying amounts of time - sometimes it runs fine for hours, other times only minutes. At first I thought this is maybe the older FUDI interface's fault (seeing some open issues here describing similar crashes using the older network interface), so I switched over to using the more recent websocket interface. Unfortunately, same problem with crashes there. The messages I send all seem to contain values within a valid range, i.e. it is no particular message that crashes ssr-brs as far as I can tell.
I attached the process to lldb, however the messages mean very little to me - most of the time it is a bad access in the cleanup: " Process 45934 stopped
Any thoughts on what this means or how I could prevent it, to get ssr-brs to a more robust state again? These bad_accesses happen somewhere in APF? Any other logs that would help? I'm on a M1 Mac.
Many thanks!