Open appfamily opened 10 months ago
Similar issue:
Not sure if it's 100% related, but definitely could be: https://discuss.cocos2d-x.org/t/is-there-any-solution-of-anr/43080.
Another interesting topic: https://issuetracker.google.com/issues/263307511 it's from a native Android app, but a bit below is an example project can be found & a video showing what happens when the ANR pops up
I have the demo, steps, and screen recording to reproduce this ANR. @Shchvova
Use the Lua code as below or modified sample - Fishies to reproduction the Controller.stop()
ANR with Solar2D 3699:
--[[
This is a demonstration for reproducing an ANR issue with
Android Controller.stop()/start().
Triggering the Android lifecycle while Lua is processing heavy
work (as shown below) can replicate the problem.
]]
local sum = 0
local co = coroutine.create(function ()
for i = 1, 100000 do
sum = sum + i
local start_time = os.clock()
while os.clock() - start_time < 0.1 do
end
coroutine.yield(sum)
end
end)
timer.performWithDelay(10, function()
for i = 1, 100 do
if co and coroutine.status(co) ~= "dead" then
coroutine.resume(co)
end
end
end, 0)
The steps are as follows (Using the timeline of screen recording):
00:00:01
Start by testing the app as usual.00:00:13
Enter the game scenes. Due to heavy computation, there is only one frame update at 22 and 32 seconds, which is not significant.00:00:33
Exit or kill the app.00:00:35
Manually restart the app from the desktop. Android Studio's first re-run won't reproduce this ANR.00:00:37
Lock the screen by pressing the power button before fully entering the game scenes. Wait for at least 5 seconds.00:00:44
Unlock the screen.00:00:45
Seeing the ANR pop-up, use adb bugreport
to retrieve it from the phone. This reproduced Controller.stop()
ANR.00:00:50
Exit or kill the app.00:01:00
Manually restart the app from the desktop.00:01:01
Exit to the desktop before fully entering the game scenes.00:01:04
Re-enter the app.00:01:06
Lock the screen by pressing the power button before fully entering the game scenes. Wait for at least 5 seconds.00:01:16
See whether the ANR pop-up appears or not, as the app may have been killed by the system. This reproduced Controller.start()
ANR (sometimes together with Controller.stop()
ANR).The screen recording:
https://github.com/coronalabs/corona/assets/53489032/23696249-eb74-4277-b242-061214caf791
I tested 3699 with a simple patch (either a draft or proof of concept) and the ANR disappeared. However, I'm not familiar with the entire locking process, so I'm not sure about potential effects.
platform/android/sdk/src/com/ansca/corona/graphics/opengl/GLSurfaceView.java
public void surfaceDestroyed() {
synchronized(sGLThreadManager) {
if (LOG_THREADS) {
Log.i("GLThread", "surfaceDestroyed tid=" + getId());
}
mHasSurface = false;
sGLThreadManager.notifyAll();
- while((!mWaitingForSurface) && (!mExited)) {
- try {
- sGLThreadManager.wait();
- } catch (InterruptedException e) {
- Thread.currentThread().interrupt();
- }
- }
+// while((!mWaitingForSurface) && (!mExited)) {
+// try {
+// sGLThreadManager.wait();
+// } catch (InterruptedException e) {
+// Thread.currentThread().interrupt();
+// }
+// }
}
}
platform/android/sdk/src/com/ansca/corona/CoronaActivity.java
private void requestResumeCoronaRuntime() {
...
// Start/resume the Corona runtime.
- fController.start();
+ new Thread(fController::start).start();
...
private void requestSuspendCoronaRuntime() {
// Suspend the Corona runtime.
if (fController != null) {
- fController.stop();
+ new Thread(fController::stop).start();
}
I hope the information helpful. Thank you.
Interesting @clang-clang-clang, is there a reason why you've use coroutines for the example? Wouldn't the ANR trigger otherwise? That would explain why not everyone is seeing the high ANR rate.
I'm not quite sure this is the right solution though. With my limited knowledge of locking in Java, simply removing the wait in surfaceDestroyed()
could result in possible crashes in other methods which think everything is correctly destroyed, while the "destroy" is not actually finished yet.
I did spend some of last week on the ANR issue as well, and have actually come up with a different solution by looking at the libgdx library which also uses Android's GLSurfaceView.java. They had similar issues on the onPause
state and avoid it by using two different methods.
onPause
takes longer than 4000ms (ANR triggers at 5000ms)updateRuntimeState()
where not everything is locked/synchronized all of the time, just the moments when controller
is actually used.Start of last weekend I've integrated these solutions in my own fork of Solar2D (https://github.com/pouwelsjochem/solar2d/blob/40f0134d8ee7447373d0900257909f36fd863ba3/platform/android/sdk/src/com/ansca/corona/Controller.java#L222) and slowly rolled that out to my users. Our ANR rate of this last build decreased by roughly 90%, from ~1% to ~0.1% (over 14K sessions in the last 24 hours). I believe Vlad is going to incorporate my changes into the main repo soon.
Great news. Glad to hear there's a solution.
Since I can't reproduce the online situation, I can only approximate the ANR by simulating the calculated pressure. Just keeping the GLThread busy and triggering the lifecycle can cause an ANR.
As for the 'patch' (it's a draft or proof of concept), it can't been production yet. I suspect that the sGLThreadManager
also has a locking issue.
Describe the bug Play Store has been reporting ANRs deep inside Solar2d for years. It has been discussed in forums, such as here but without conclusive results.
There is a new built-in analysis of ANRs in the Play Store and perhaps it can shed light on the root cause? See below
To Reproduce Steps to reproduce the behavior:
Target platform and where build was made:
Additional context
whole test project or build.settings Available on demand