Open neuronix opened 6 years ago
App with Air 27, Starling 1.8, Feathers 2.x ANR: 0.20% Crash rate: 0.23%
Hi,
Our App(Starling 1.8-AIR 26/21):
ANR Rate : 4% Crash Rate: 5%
39 reports last 7 days:
Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 (has extras) }
"main" prio=5 tid=1 Native | group="main" sCount=1 dsCount=0 obj=0x74de5000 self=0xb7d81f78 | sysTid=21001 nice=-6 cgrp=bg_non_interactive sched=0/0 handle=0xb6f01bec | state=S schedstat=( 883042412234 69541281861 500913 ) utm=83394 stm=4910 core=1 HZ=100 | stack=0xbe568000-0xbe56a000 stackSize=8MB | held mutexes=
at com.adobe.air.customHandler.callTimeoutFunction (Native method) at com.adobe.air.customHandler.handleMessage (customHandler.java:22) at android.os.Handler.dispatchMessage (Handler.java:102) at android.os.Looper.loop (Looper.java:135) at android.app.ActivityThread.main (ActivityThread.java:5912) at java.lang.reflect.Method.invoke! (Native method) at java.lang.reflect.Method.invoke (Method.java:372) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:1405) at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1200)
10 reports last 7 days:
Input dispatching timed out (Waiting because the touched window has not finished processing the input events that were previously delivered to it.)
"main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x415a4ca8 self=0x414de570 | sysTid=22564 nice=-6 sched=0/0 cgrp=apps handle=1074331988 | state=S schedstat=( 0 0 0 ) utm=4477 stm=515 core=2
at com.adobe.air.customHandler.callTimeoutFunction (Native Method) at com.adobe.air.customHandler.handleMessage (customHandler.java:22) at android.os.Handler.dispatchMessage (Handler.java:102) at android.os.Looper.loop (Looper.java:136) at android.app.ActivityThread.main (ActivityThread.java:5017) at java.lang.reflect.Method.invokeNative (Native Method) at java.lang.reflect.Method.invoke (Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:779) at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:595) at dalvik.system.NativeStart.main (Native Method)
Thanks to both of you! Since both apps have such different rates, could you give a little extra info?
Do you have little or a lot of user inputs ? Yes, we have. Do you have any input intense situations like tapping or dragging on the screen? Yes, we have. Do you have ANEs included in the app (notably ads) ? Ads: AdCel, AdColony, SuperAwesome Some District Anes.(Contacts/mediaPlayer/flurry/more) Goviral Custom Anes.
Thanks @Seb-AS, will be interesting to compare with @skolesnyk's app regarding the amount of user inputs.
Scrolling lists, tapping on Sound Player play/pause buttons, text entry. There's FacebookAPI ANE integration, so some users switch to Facebook native app or browser and then back to the app. Using about 14 ANEs, mostly Distriqt, but also BranchIO, Fyber (with 2 ad networks): showing interstitials without user's action and showing Fyber Offerwall after user's interaction with UI.
App with Air 25, Starling 2.2 ANR: 7.5% Crash rate: 4.0%
Broadcast of Intent { act=android.intent.action.SCREEN_ON flg=0x50000010 } Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 } Broadcast of Intent { act=org.herozero.NOTIFICATION_DELAYED and a lot of Input dispatching timed out errors
App with Air 25, Starling 1.8 ANR: 2.6% Crash rate: 2.2%
Broadcast of Intent { act=android.intent.action.SCREEN_ON flg=0x50000010 } Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 } and a lot of Input dispatching timed out and keyDispatchingTimedOut errors
Do you have little or a lot of user inputs ? Yes, in both apps. Do you have any input intense situations like tapping or dragging on the screen? First app has more taps, second a lot of swiping. Do you have ANEs included in the app (notably ads) ? Both Apps use a lot of distriqt ANEs (Application, Billing, Facebook, ...) and for ads the Appodeal ANE.
Thank you @MathiasFabian for sharing!
@PrimaryFeather this issue seems to be plaguing a lot if not just about all AIR apps. Could you get someone from the AIR team to look into this?
App with Air 22 and Starling 1.9 ANR: 2.79% Crash 0.51%
Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 5. Wait queue head age: 5502.9ms.)
Do you have little or a lot of user inputs ? No. Do you have any input intense situations like tapping or dragging on the screen? Yes, the app is an educational game for kids Do you have ANEs included in the app (notably ads) ? Distriqt Extension Files for Android app only.
com.adobe.air.AIRWindowSurfaceView.onKeyUp (AIRWindowSurfaceView.java:395)
com.adobe.air.customHandler.callTimeoutFunction
.
The description for it:
Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 2. Wait queue head age: 5926.5ms.)
Actually, all of our ANRs are variations of "Input dispatching timed out". Not sure if that's normal.EDIT: The Google Play Console was being extremely flakey – our most common ANR error was in fact Broadcast of Intent { act=android.intent.action.SCREEN_OFF ...
They accounted for 77% of all ANRs, the rest being Input dispatching timed out
.
One thing to note is that the stack traces for both errors mention com.adobe.air.customHandler.callTimeoutFunction
EDIT2: The Play Console is definitely flakey for us – using the same filters and time period as i used previously it's saying 90% of our ANR errors are Input dispatching timed out
, 10% Broadcast of Intent { act=android.intent.action.SCREEN_OFF ...
.
Could it be these are the same error? We've spoken to Google and they say their tech team are having a look.
We can replicate the ANRs but it takes a while and they seem to happen randomly. We're completely flummoxed. The ANRs can happen on a completely static screen with only very occasional input. I suspected a memory leak but Scout says no.
Concerned our ANEs are mentioned here, if this is related to us we will chase down the issue immediately, though I'm fairly confident it's not.
Is anyone having this issue without distriqt's ANEs?
@marchbold we use Milkman ANEs, I don’t think it’s related to your ANEs in particular, my gut feeling is an issue in the Android/AIR input handling.
Yeah, that's my feeling too, just was a little concerned when saw our extensions mentioned several times. We did actually do a thorough review of our ANEs to identify any potential causes when we encountered this issue in our apps and couldn't find anything.
Our largest app:
ANRs are mainly due to two causes:
Broadcast of Intent { act=android.intent.action.SCREEN_OFF ...
Input dispatching timed out (Waiting because the touched window has not finished ...
Similar logs to the ones above.
We use Fyber Ads ANE
Crashes are mainly related to Fyber:
java.lang.NullPointerException com.fyber.ads.videos.a.b.a RewardedVideoClient.java
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR)
Our most common crashes over last 60 days:
com.adobe.air.AIRWindowSurfaceView.onKeyUp (AIRWindowSurfaceView.java:395)
com.adobe.air.AIRWindowSurfaceView.surfaceChanged
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR)
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR)
We are also seeing very similar ANR rates and similar causes:
Broadcast of Intent { act=android.intent.action.SCREEN_OFF ...
Broadcast of Intent { act=android.intent.action.SCREEN_ON ...
Input dispatching timed out (Waiting because the touched window has not finished ...
Here are some stats on our app:
We also don't think that these ANRs are extension related. When we have had some other ANRs caused by extensions we were able to see a clear stacktrace to the extension code.
Do you have little or a lot of user inputs? Typically 1 or 2 taps every few seconds Do you have any input intense situations like tapping or dragging on the screen? No Do you have ANEs included in the app (notably ads) ? Answered above
We're seeing ANRs and crashes above the bad behaviour thresholds in our app. • ANR rate: 0.73% • Crash rate: 1.47% • Air 23.0, Starling 1.8.1, Feathers 2.2
Do you have little or a lot of user inputs ? Not that much user input.
Do you have any input intense situations like tapping or dragging on the screen? Tapping, yes, but never really intense tapping. Also dragging the screen on some pages. ANRs have happened when neither tapping nor dragging was being done by the user, and no intense computation was being done by our app.
Do you have ANEs included in the app (notably ads) ? Yes, but no ads: • Milkman GoViral, RateBox • Distriqt – androidsupport.V4, Core, Firebase, Crash, GoogleIdentity, Notifications, playservices.Auth, playservices.Base, playservices.Identity, Resources • pl.mateuszmackowiak.nativeANE.properties.SystemProperties.ane
Typical ANR reports:
Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago
Input dispatching timed out (Waiting because the touched window has not finished processing the input events that were previously delivered to it.)
Typical crash traces:
java.lang.NullPointerException:
at com.adobe.air.AIRWindowSurfaceView.onKeyUp (AIRWindowSurfaceView.java:395)
at android.view.KeyEvent.dispatch (KeyEvent.java:2907)
at android.view.View.dispatchKeyEvent (View.java:7543)
java.lang.NullPointerException:
at com.adobe.air.AIRWindowSurfaceView.surfaceChanged (AIRWindowSurfaceView.java:735)
at android.view.SurfaceView.updateWindow (SurfaceView.java:731)
at android.view.SurfaceView.onWindowVisibilityChanged (SurfaceView.java:322)
Hi, Could you please a log a bug at http://tracker.adobe.com/ and attach a sample application where this issue is reproducible.
Thanks, Ankit | Adobe AIR Engineering
Hi @Ankit-Adobe, Thanks for joining the conversation. There is already a bug in the tracker here: https://tracker.adobe.com/#/view/AIR-4198519
We have tried to reproduce the bug but without success so far. Could you give us more information on when/how this error is triggered? It always seems to originate from com.adobe.air.customHandler.callTimeoutFunction
Hi guys, we are having the exact same errors on all our apps, we have also updated the forum topic (https://forum.starling-framework.org/topic/anrs-crash-rate-survey?replies=5#post-109884) and voted and commented on the tracker url provided.
• ANR rate: 3.60% • Crash rate: 0.90% • Air 26.0, Starling 2.2 (2017-06-27), Feathers 3.1.2 (January 2017)
Thanks,
Just a note to say i made an edit to my original post above (the main ANR error was wrong)
Just a thought (actually an idea of Robby Campano): could it be that the ANRs we are seeing are the result of users hitting the screen while the apps are restoring from a context loss? After all, this takes some time, and the textures are restored synchronously, i.e. all input will be blocked.
I could switch the texture restoration to being asynchronous, but it's been reported that this is not entirely reliable on some Android hardware. Could be worth a shot, though — this requires only a small change inside the AssetManager (I think). Then I could make this an "opt-in" flag.
Any comments?
As a matter of fact I was just testing that, these are the results:
Turns out that, -when the user locks the device- the app has a harder time recovering from a context loss when the app is in foreground rather than in background. At least it took significantly more time on my device (Samsung Galaxy S6).
If I send the app to background, then lock the device, unlock it back, and bring it to foreground again the restoration is still visible, but it takes none to very short time.
When I try it the other way around (app was in foreground when the device got locked), while the app is restoring, I can still play around and click stuff when, according with what you just said, all the input should be blocked. I have to say though, that I can't really tell an exact restoration time for each example, since sometimes my device takes longer because, I don't know, it doesn't like my face :)
I'm not entirely sure if I'm missing something but I can definitely mess around with the app when recovering from the context loss. That said, I still haven't been able to reproduce the ANR itself on my phone while debugging.
Thanks for the additional info, Christian!
I'm not entirely sure if I'm missing something but I can definitely mess around with the app when recovering from the context loss.
You're totally right: it depends on the size of the textures being restored. I imagine it takes a while loading a 4k*4k texture, for example; and during that time, input will be blocked. Loading several smaller ones should not be that much of an issue, because there's a gap for AS3 to process stuff between each upload process.
I see, perhaps what I'm about to suggest isn't doable, but what if the whole Stage was left untouchable as soon as the restoration begins, and it gets back to touchable as soon as the TEXTURES_RESTORED
event gets dispatched? Could that be an option? I understant that during that restoration time no ANR's could be generated since the touch events would not be triggered, right?
Hm, no, that wouldn't help. Since it's a synchronous operation, all of AS3 stops accepting input — and that includes the mouse events of the classic Flash (which Starling otherwise queues). The only solution would be to avoid that synchronous (time consuming) operation.
Again, I don't know if that's really the issue — it's just a possibility.
I do not think this is related to context loss. ANRs occur during gameplay periods according to our user reports. However it seems to occur after a period without user inputs:
Case 1: match 3 game, users report freezing often after waiting to think on a move (5s maybe) Case 2: timer based game, users report freezing always when about 4-5s left out of 15, so again probably a short pause between inputs.
ANRs occur during gameplay periods according to our user reports.
Aha, that's useful information — thanks!
Update on Google Play Console showing unreliable figures for Broadcast of Intent { act=android.intent.action.SCREEN_OFF ...
and Input dispatching timed out
ANR errors: Google got back to us saying it's a bug in their system and the two errors are the same.
@chichilatte Did Google say anything else in specific? I mean, did they really assume these ANR's are they fault? If so, did they give you an ETA on how to fix them? I don't quite get how this is Google's fault though, it would make sense if these ANR's only appeared on the last Android version, but at least on our case it's happening from Android 5.0 to 7.1 so how is this their fault? Any info would be really appreciated since all our games are starting to lose visibility because of this high ANR % index.
@cgascons I think there is a misunderstanding here, the issue chichilatte is talking about is Google not merging two identical ANR reports, not that the ANRs are Google’s fault.
Oh I see now, @neuronix you are totally right, sorry about it!
Here's a thought, we tested creating 1 million objects at once, which causes our app to hang for about 5 seconds, and we're getting an ANR once the app starts responding again... as soon as GC kicks off. We pulled the logs from the android device and noticed the same line in the error log as we're seeing from the Google Play Dashboard. The following line is from our device log at the very bottom of the stack trace: "HeapTaskDaemon" daemon prio=5 tid=6 Blocked
Here's our code that allowed us to generate the ANR:
var temp:Array = new Array(); for (var i:int = 0; i < 1000000; i++) { var s:Sprite = new Sprite(); temp.push( s ); }
Awesome find @soccerob ! I hadn't thought of the GC, I only tried hanging the app with heavy calculations.
@Ankit-Adobe please check out this reproductible case!
Although our app doesn't ever construct this many objects at once, the thought is that the ANR might still be caused during garbage collection. If that's the case, we might look into object pooling some of our more robust classes to be re-used and never be marked for garbage collection. We're running through Scout and noticed that a context loss also triggers a significant task for garbage collection. Still investigating... but would love some additional input on this.
Just checked our Google ANR reports for the last week:
83% have HeapTaskDaemon daemon prio=5 tid=6 Blocked
as the last line in the stacktrace. The others have GC
or GCDaemon
in the stacktrace (usually the first line after main
).
We also have HeapTaskDaemon daemon prio=5 tid=6 Blocked
at the end of some our ANR stacktraces, the others have Signal Catcher.
Not sure what to think, we do know for sure though, that these ANRs started increasing right after publishing an update on our client which "theoretically" was meant to increase overall performance because there were lots of objects we were using in some classes that weren't being properly destroyed. Now that all of those are being properly flagged for collection, the GC seems to be having a hard time.
Also have HeapTaskDaemon daemon prio=5 tid=6 Blocked in most ANR stack traces, hope @Ankit-Adobe can use this to implement a fix! :)
@cgascons if that indeed caused your ANR rate to go up, this may very well be the culprit. ironic that by cleaning up your objects properly, you've now managed to cause a worse issue than when the objects were just leaking memory
Indeed @soccerob, it's quite a shame. We are getting ready a new version with "better" GC management than before. What we are doing is attempting to "force-calling" the gc more often by making use of a very dirty hack we found around (even though we are not sure if it's still working). Basically some people claim that making this call:
try { new LocalConnection().connect( "foo" ); new LocalConnection().connect( "foo" ); } catch ( e:Error ) {}
forces the GC to kick in. We are about to create the release version with advanced telemetry enabled to test on Scout and make sure this still works.
If it does work, we will publish this new version hoping to decrease the ANR's until Adobe comes back with a solution for this.
Does System.gc() not work? We also added gc calls in strategic end points to our app as a temporary workaround. In Scout I can see the gc spikes that match the calls in the flash version, haven't checked AIR yet though.
Hi neuronix, as far as I know System.gc()
only works on a Debugger
Would have been to good.. Well there is this, but not guaranteed to actually trigger GC: https://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/system/System.html#pauseForGCIfCollectionImminent()
I'm pretty sure we used that call a long time ago. Quite ashamed to be honest I don't quite remember why we did stop using it.. Anyways, will give it a try as well and share it here if it also works fine and it's noticeable in Scout.
Hi guys! I'm also affected by this ANRs.
I've been using System.pauseForGCIfCollectionImminent() since the beginning, I can't tell if it makes a difference. With Air + Flex SDK 26 and after making many performance improvements (reduced draw calls from ~150 to 20) I managed to drop the rate from 2.5% to 1.5%. In my latest build I tried using Air 28 with the ASC compiler and the ANRs jumped again to 2.5%.
Still clueless about what's the actual trigger for the ANRs.
As of today I can also confirm that, after even adding some more performance tweaks and switching from System.gc()
to System.pauseForGCIfCollectionImminent()
, the ANR's are still present, always preceded by the GC error block.
I wonder, would it help if you did System.pauseForGCIfCollectionImminent()
with a low figure, e.g. System.pauseForGCIfCollectionImminent(0.2)
? It defaults is 0.75 otherwise. With a smaller number GC would happen more frequently and take less time. Course it all depends on how often you call the function.
Indeed chicilatte, one should use a very small number. We'll be testing with System.pauseForGCIfCollectionImminent(0.1) and see how things go.
Uhmm, I'm not quite sure if that's going to make a huge difference to be honest, but it's worth a try. It's just exasperating to be in this situation...
I'm forwarding a question from Adobe:
Is it possible for [you] to share the details of the devices along with OS version where they are seeing ANR’s and Crashes?
Is this information available somewhere with the ANR stats?
Here is an report from our Google Dev Console. BroadCast of intent is the same as the Input dispatch ANR.
Hi guys,
I am mirorring my post from the Starling forum here with Daniel's permission.
We have ANRs on Android in all our apps and would like to know if it's maybe our apps particularly, or if it's AIR related.
Could you please check your Google Dev Console and go to "Android Vitals" > "Overview".
The first section is "Bad Behaviors".
Please share the following values (no need to mention the app, just pick the biggest app you have or the average of all your apps please):
The main ANR we are experiencing is "Input dispatching timed out" triggered by "com.adobe.air.customHandler.callTimeoutFunction". There are heaps of topics on it on the web, and they only proper response by Adobe is that we should avoid doing heavy calculations on user input.
But all our apps use Starling, and Starling defers all user inputs to next frame so this should not be an issue.