This is a follow-up to #657, dealing with audio.onComplete.
@naveen-pcs pointed out on the forum that there were still audio crashes in 3701.
The stack trace has a telltale ~BaseResourceHandle(). After digging around, it seems the BaseResource in question was basically a way for the audio task to detect if the underlying Lua state has been invalidated and, if so, do nothing. Unfortunately, when the resource is set up, it updates a shared reference count, but on the audio thread and without any synchronization. Sure enough, that count must get out of whack and lead to something like a double free of the shared memory.
The audio thread is started and closed within the Lua state lifetime, and I don't see any way for the tasks to execute outside this range, since the scheduler is similarly contained. What I ended up doing, thus, was to create a single nPlatformNotifier belonging to the Runtime (so the reference count is only touched in the main thread) and sending the tasks through that. (All the accesses are read-only aside from the already-hardened Scheduler methods.)
I'm also now using Lua references rather than memory-allocated objects to track channel usage / onComplete callbacks, and sending them on through the Scheduler. This avoids some memory churn and simplifies cleanup.
For that matter, the channels-to-callback also now has some synchronization, via a per-channel atomic_flag. I forget the exact details at the moment, but the channel management looked shaky in its previous form (there was definite cross-thread touching). I haven't tested if this was relevant to #19.
I've currently only commented out the old code, not removed it. On request, I believe I can explain most, if not all, of those race bugs described in the comments, in light of my analysis.
The previous mutex implementation in the Scheduler is now also based on atomics.
Attached is an Android build for testing: Corona.aar.zip (You can unzip this and replace Corona.aar in your Native directory if you want to test on Android. Might want to save your old one, too.)
This is based on this PR and #661.
With this I was able to run the test from #296 on Windows, Mac, and Android without problems such as the loading issues mentioned in Discord nor the Back button bug from #663.
This is a follow-up to #657, dealing with
audio.onComplete
.@naveen-pcs pointed out on the forum that there were still audio crashes in 3701.
The stack trace has a telltale
~BaseResourceHandle()
. After digging around, it seems theBaseResource
in question was basically a way for the audio task to detect if the underlying Lua state has been invalidated and, if so, do nothing. Unfortunately, when the resource is set up, it updates a shared reference count, but on the audio thread and without any synchronization. Sure enough, that count must get out of whack and lead to something like a double free of the shared memory.The audio thread is started and closed within the Lua state lifetime, and I don't see any way for the tasks to execute outside this range, since the scheduler is similarly contained. What I ended up doing, thus, was to create a single n
PlatformNotifier
belonging to theRuntime
(so the reference count is only touched in the main thread) and sending the tasks through that. (All the accesses are read-only aside from the already-hardenedScheduler
methods.)I'm also now using Lua references rather than memory-allocated objects to track channel usage /
onComplete
callbacks, and sending them on through theScheduler
. This avoids some memory churn and simplifies cleanup.For that matter, the channels-to-callback also now has some synchronization, via a per-channel
atomic_flag
. I forget the exact details at the moment, but the channel management looked shaky in its previous form (there was definite cross-thread touching). I haven't tested if this was relevant to #19.I've currently only commented out the old code, not removed it. On request, I believe I can explain most, if not all, of those race bugs described in the comments, in light of my analysis.
The previous
mutex
implementation in theScheduler
is now also based onatomic
s.Attached is an Android build for testing: Corona.aar.zip (You can unzip this and replace
Corona.aar
in yourNative
directory if you want to test on Android. Might want to save your old one, too.)This is based on this PR and #661.
With this I was able to run the test from #296 on Windows, Mac, and Android without problems such as the loading issues mentioned in Discord nor the Back button bug from #663.