airsdk / Adobe-Runtime-Support

Report, track and discuss issues in Adobe AIR. Monitored by Adobe - and HARMAN - and maintained by the AIR community.
201 stars 11 forks source link

ANR Input dispatching timed out #1831

Open raresn opened 2 years ago

raresn commented 2 years ago

Input dispatching timed out (ee8565d air.ro.fxstudio.agenda/air.ro.fxstudio.agenda.AIRAppEntry (server) is not responding. Waited 5001ms for MotionEvent)

Input dispatching timed out (air.ro.fxstudio.agenda/air.ro.fxstudio.agenda.AIRAppEntry, dab185f air.ro.fxstudio.agenda/air.ro.fxstudio.agenda.AIRAppEntry (server) is not responding. Waited 5010ms for KeyEvent=0) These two happened in total of over 90 times.

Can you tell us what other info we can provide to try to reduce them?

ajwfrost commented 2 years ago

I don't suppose you're able to see whether these happened during normal execution of the app, or right at the start-up? We've got a couple of things we're working on that should result in a drop of these sorts of issues, there are three basic possibilities: 1) it's during start-up, if the user clicks on the screen while the application is still loading, where the loading can take 5s+ if there are a lot of things going on with asset loading, ANE initialisation, etc.. 2) it's in normal operation but where there's some long-running / blocking operation happening. 5s would be quite a long time but it's possible perhaps that we're doing a big memory operation or synchronous load/decode of something.. 3) or something has actually hung ... this is where we had been finding some issues earlier, particularly with the sound functionality, where we ended up with a thread deadlock and the application just hung until the OS killed it. I hope that the latest SDK has resolved all of these though....

For the first of these, we are looking to bring in AOT compilation for Android which should dramatically reduce the start-up time, but it's also worth trying to structure an application so that it loads things in over multiple frames at start-up.

The second is something where (a) we're aiming to move everything off to a secondary thread anyway, which means the user experience may stay the same (slow responsiveness) but the OS wouldn't notice and hence wouldn't cause an ANR; or (b) we should be able to get some more diagnostic information out where there are long-running synchronous activities happening..

thanks

pitanello commented 2 years ago

"Input dispatching timed out" is also a big problem for my games and it is the only one that rises the Core Vitals ANR rate in the console. During the years i tried everything i found online (reducing loops, move to a separate function the click event code execution etc..) but nothing really changed and i am quite positive the main problem could be this:

"1. it's during start-up, if the user clicks on the screen while the application is still loading, where the loading can take 5s+ if there are a lot of things going on with asset loading, ANE initialisation, etc.."

I did a test quite a while ago with an old Galaxy Tab, I created two empty apps and i attatched few ANE to one of them with no code implementation. It took quite a long time to load the one with the ANE and i was wondering if it was possible to load the ANE at runtime instead of at startup.

ilushaaa commented 2 years ago

For the first of these, we are looking to bring in AOT compilation for Android which should dramatically reduce the start-up time, but it's also worth trying to structure an application so that it loads things in over multiple frames at start-up.

The second is something where (a) we're aiming to move everything off to a secondary thread anyway, which means the user experience may stay the same (slow responsiveness) but the OS wouldn't notice and hence wouldn't cause an ANR; or (b) we should be able to get some more diagnostic information out where there are long-running synchronous activities happening..

Do you have an approximate timeline for implementing these features? Thanks

Elintondm commented 2 years ago

This is our main problem too!!!

@ajwfrost please choose the fastest way to solve this problem. I think the solution "move everything off to a secondary thread anyway" would be the easiest choice for now and the other approaches could be done later.

Please this is very urgent for us.

ajwfrost commented 2 years ago

@pitanello dynamic or run-time loading of ANEs is an interesting one .. I can't actually see a good reason why that wouldn't be possible, other than the fact that you would need to be very careful when using the libraries as you don't want to try accessing an ActionScript definition that's not been loaded yet. There are two stages to the ANE loading: 1) parse the SWC library file i.e. all the AS3 glue logic that you write for the ANE 2) load the native library and call the initialisers (although on Android with Java ANEs, this is pretty much a no-op since the Java definitions were already compiled into the application at build time).

Sometimes the first of these steps does actually take a bit of time. We did some profiling for a customer a few years ago and found that their SWC library had loads of utility ActionScript classes for communicating between the ANE and the main app, and each of these classes was in its own ABC block in the SWC file. This adds a surprising amount of delay to the loading/parsing as the definitions are not all in the same 'pool' .. this customer saved a lot of time by just using a tool called abcmerge to simplify the library SWC file. We had started looking to see if we could build a similar tool into the workflow, but it's quite a complex task to merge different ABC blocks...

Of course, if we did have JIT compilation, then the fact that definitions are in different pools wouldn't be a massive issue and would translate into a compile-time problem rather than a run-time/loading issue. But dynamically loading the definitions would perhaps be a simpler way (i.e. less effort for us) to spread out the start-up timings....

The challenge then is, how do you load them? I'm guessing that mostly you are just calling functions from your ActionScript such as:

import com.distriqt.extension.application.Application;
public class MyClass
{
  public function getID() : String
  {
    var deviceID:String = Application.service.device.uniqueId( IDType.VENDOR );
    return deviceID;
  }
}

but you wouldn't be able to call that method until the ANE's library file had been parsed. Within that API, no doubt there is a call to ExtensionContext.createExtensionContext() but that's the detail that's hidden from you.

It would perhaps work if we had a function such as ExtensionContext.loadExtension(extension_id) but you would have to call this before the ActionScript virtual machine entered that getID method otherwise the code would throw a VerifyError.

@ilushaaa we've got the AOT partly working - i.e. most of the build-time work is done, but we need to work on the toolchain for actually building an Android application where this then works. I was hoping it would be ready around the summer time, but we're finding a lot of these bigger tasks are getting shunted a little due to a lot of smaller requests.

@Elintondm your message just came in whilst writing this .. so the "move everything to a different thread" is something we can revisit, I don't know whether this is actually the quickest/simplest solution though because of how the thread handling is already done within AIR. The danger being that we introduce additional problems (and even deadlock) if we try this..

Let me check with the developer who had done the initial look at this "other thread" idea and we'll see if we think we can get something working with a basic test case, and then we should be able to see how much additional effort there then may be..

thanks

pitanello commented 2 years ago

It would be interesting to know if @marchbold could provide some implementation examples for his ANEs

marchbold commented 2 years ago

Happy to make any changes to our ANEs to improve load times but it's out of our control currently.

Also in our research compacting the SWC's in the ANEs to reduce the ABC blocks in them fails as they are no longer recognised as valid ANEs by AIR.

FYI with all our ANEs the call to ExtensionContext.createExtensionContext() is made the first time you access the extension singleton. eg Application.service.

ajwfrost commented 2 years ago

Yes @pitanello I should have added, we'd already discussed a lot of this with @marchbold, the compacting of ABC blocks in Adobe's optimizer tool is failing for some reason hence we had started to look if we could create something for this as part of our build workflow. I'm actually thinking perhaps that now we should instead look at the original compilation step via the 'compc' tool, as the 'mxmlc' tool already does this merge, so that may be a quicker alternative..

The createExtensionContext does a bit more work on extensions using native (C++) libraries, so I guess overall the dynamic loading of ANEs isn't going to have a massive impact... although I am interested by your mention of the delays when you tried with/without ANEs on an older device. I suspect it could be reasonably easy to make this change from our perspective and it wouldn't need any changes from the ANE vendor/library, it would just mean your code would need to manually trigger the loading of the ANE prior to any use of its API...


On the 'calling everything from another thread' front, we are investigating how this could be done ... there would need to be quite a number of significant shifts to how the rendering and event handling is done, it's certainly not a trivial task..

pitanello commented 2 years ago

Thank you @ajwfrost from my point of view would be perfect to trigger manually the loading of the Ane and wait for the event that complete the process in order to initialize the native extension.

megajogos commented 2 years ago

The capability to manually load an Ane would be very nice! @ajwfrost

We do anything to avoid the ANRs!!!!

megajogos commented 2 years ago

@ajwfrost Any news about this?

This is very urgent for us! This is the biggest problem that we have!

ajwfrost commented 2 years ago

@megajogos there are a few suggestions listed here..! so just to check what it is you're after?

Probably the first of these is actually the simplest because it is a more discrete change. The other approaches are pretty large in terms of both effort and the risk of regression...

thanks

ajwfrost commented 2 years ago

@megajogos are you able to email me directly please, we have an initial build of the runtime where everything that could take a while is now handled in a different thread. It's working now for our basic test cases in cpu, direct and gpu modes, but we're unlikely to have exercised the whole set of possible conditions so are looking for some real-world testing...

thanks

megajogos commented 2 years ago

Hi @ajwfrost

Could you send your email? We are glad to help!

ajwfrost commented 2 years ago

andrew dot frost at harman dot com (vague attempt to stop me from getting hit with a lot of spam, although sadly I think it's a bit late now...!)

thanks :-)

megajogos commented 2 years ago

Hi @ajwfrost!

We have tested the new version (33.1.1.889) and the good is: The ANRs at startup are gone!

The bad news is: Input is not working it seens like mouse events are not dispached to our app.

In logcat we noticed some errors relatad to some tasks that are required to use the UI Thread: log_mega_113.0.136.txt

megajogos commented 2 years ago

It seens like some operations performed by district ANEs requires the use of the UI Thread.

We notice this error in Adversts and Application ANEs. @marchbold

I think would be necessary some API methods on Adobe Air to put something to run on the UI Thread. Like:

runOnUiThread(func:Function)

marchbold commented 2 years ago

Hi, Yes definitely we need to run a lot of code on the UI thread and our ANEs rely on this. Particular Adverts, where AdMob requires a lot of the code to be run on the ui thread. What has changed in this new build?

marchbold commented 2 years ago

We likely can handle this change in the ANE but this is a major change to the design of ANEs if the function calls are not called on the ui thread any more and we'll need a significant amount of time to change all the extensions over!

ajwfrost commented 2 years ago

@marchbold so it may be simpler if we switch things so that calls into the native code from ANEs are done on the main thread still... although we'll have to check whether this is possible to do given that ANEs can then call back into ActionScript which needs to always be run from one thread. Please hold off any changes yet..!

The change here is that we have moved the whole of the ActionScript VM into a secondary/background thread, so that we can avoid ANRs. So the whole application now runs in a background thread, with some synchronisation points where we need to deal with the Android view.

We can look at what happens if we do switch things so that ExtensionContext.call methods are moved back onto the main thread, which may be an easier workaround than having to change every ANE...

thanks

marchbold commented 2 years ago

Yeah, great, got me worried that we were going to have to do a massive update to all our extensions! :)

Might be nice to have a way to migrate to this approach slowly though, eg an alternative callInBackground function on an ExtensionContext or a flag to the call function?

There are definitely times when being off the UI thread for ANE calls is super advantageous!

FliplineStudios commented 2 years ago

Are these Android changes in the latest .889 release on the website? I don't see any mention of VM changes in the Release Notes PDF, and don't see the version on the site being marked as a beta release on the Release Notes page.

Just curious since with Google enforcing required updates to all existing apps in November, we'll be needing to update a large back catalog of apps soon, and will have to decide on a stable AIR SDK version to use for updating and repackaging everything. As much as I'm looking forward to ANR reductions and things like AOT compiling for Android, all of these big proposed AIR SDK changes right when we're looking for stability has me a bit wary of downloading SDK updates...

ajwfrost commented 2 years ago

@FliplineStudios yes don't worry, these are major changes that won't be in a normal release yet! plus I am thinking we would try to make them opt-in, so that people who are good with the existing threading model are able to continue with the known situation there.

ajwfrost commented 2 years ago

@marchbold so we've looked a bit at the extensions, and what we can do here. The challenge is that the functions are normally called within the ActionScript VM main thread, and that's what we're now looking to shift away from being the same as the UI thread. It is of course possible to pause our new 'VM thread', call the extension on the UI thread, and (probably) wait in the VM thread until the UI thread has finished executing the extension method.

The problem with that though is that if the extension method calls any AS3 APIs, including FREObject.newObject(), then the VM will assert because it's being called from the wrong thread (i.e. you get the FREWrongThreadException being thrown).

So I think we may be able to do something where we have: 1) 'normal' extension functions that are run in the VM thread and have full access to the FREObject APIs - but then would hit the problem where they're not able to call View methods or anything else that's required to be on the UI thread 2) 'async' extension functions which are run on the UI thread but that can't access any FREObject APIs to call any ActionScript stuff.

Having said that, for the second option here we would potentially be able to have these return primitive values (bool/int/string/etc) and for the ExtensionContext 'call' method to wait for these to complete, if that was necessary, so they would then be synchronised... but I don't know whether that would be useful anyway as it depends on what else these "run on UI thread" methods would need to do.

Obviously the alternative is for us not to change how the extension methods are called, and for the extensions themselves to be updated to run stuff on the UI thread. I don't know whether the limitations in the 'run on UI thread' approach means that you'd always need to do a lot of changes to the code..? I'm trying to work out whether we could adapt the FREObject.newObject mechanism for primitives so that it still works, but without creating the ActionScript equivalent of this until the function returns and we drop back into the VM thread. So that would mean perhaps that simple cases where you're just returning true/false or strings etc would still work, but if you're calling any ActionScript methods, it's definitely not going to work..

Let me know what you think! Trying to avoid having to do a lot of rework in the ANEs here (although I guess there's still going to have to be a way to specify whether a function needs to be called from the UI thread or not.... so, to get this working, it's going to need some changes).

thanks

marchbold commented 2 years ago

@ajwfrost Lots to digest here. It seems as though whatever path you take it is going to need work on the extensions. So with that in mind I'd prefer to take a step back and consider what's best going forward for extension development as a whole.

As long as we still have access to the main activity and then can trigger a runnable via runOnUiThread we should be able to relatively quickly wrap any UI FREFunctions in a runnable and post that to the ui thread. And that from within that ui thread a dispatch event call on the extension context will still work (which I believe it should?).

It may mean we'll have to change the API on some of these calls if they were synchronously returning a value. Major concern is around manipulation of views (eg the position of an AdView) if we have to make the retrieval of the coordinates async.

I'd probably have to go through all our extensions and see exactly what does and doesn't need to be run on the ui thread. My initial feeling is not a lot but I do know things like the AdMob SDK have a lot of functionality that needs to be run on the ui thread.

Might be good to get our hands on this version and see what works.

ajwfrost commented 2 years ago

@marchbold thanks ... yes I don't know that there's a simple way for us to make it all work without there being any changes :-(

When testing it here, this is exactly what I've been doing i.e. just taking the FREFunction call implementation and renaming it to callAsync, and adding a call implementation that uses the context to then use runOnUiThread() to invoke the async version.

Yes you can still call the FRE dispatch status event, that goes into a queue in the runtime and gets processed by the main thread within the core AS3/rendering loop. Obviously it's just using strings, hence the possibility of something similar where we just support primitive objects, but the concern would be if an async function tried to access an AS3 object argument that had since been garbage collected (or moved in memory or whatever) -> which means that actually, the synchronous call implementation would need to unpack any FREObject arguments into their Java equivalents, and use these to call the function on the UI thread.

I'll send you over this version of the SDK to have a play with ... let me know if it's going to be significant effort!

One important point .. can we make it so that it works both ways? I believe if you're already in the UI thread and call runOnUiThread, then it invokes it synchronously then and there? because we are looking to make all this "do everything in the VM on a background thread" an opt-in feature, which means the extensions would need to work when run both on the UI thread and on a background thread........ sorry!


@megajogos I'm impressed that you'd seen a significant difference in the ANRs at start-up, which makes me think it must have been very bad! but it also makes me wonder whether we can address your particular problem without having to resort to all these changes, which I think are quite a high-risk approach. Would it be possible to get an apk-debug (or aab-debug) build of your application, so that we can run it up with Scout and maybe with a debug version of the runtime, to find out what's causing the initial delays?

thanks

megajogos commented 2 years ago

Hi, @ajwfrost

Sure, we will generate and send for you. What version of adobe air is better to you?

For us the ANR rate is the biggest problem so do anything to solve it.

One of example of our apps: https://play.google.com/store/apps/details?id=air.br.com.megajogos.mobile

marchbold commented 2 years ago

@ajwfrost Sure thing, definitely should be able to handle both cases as you are correct the runOnUiThread call automatically handles each case.

It will be a bit of work for us either way so would have to be something we approach gradually over several months.

megajogos commented 2 years ago

Hi @ajwfrost

This is the build with aab-debug using adobe ar 33.1.1.856: https://www.megajogos.com.br/filemanager/allgames_android_megajogos_trunk_aab.aab

This app crashes is low end devices if you tap the screen while the app is loading. (before our code run)

megajogos commented 2 years ago

Any news? @ajwfrost

Have you analyzed our app?

ajwfrost commented 2 years ago

Hi @megajogos - we did a brief bit of analysis just from the existing logs - there are a couple of curious gaps where it seems to be pausing for a little over 1.5s during start-up, so we need to start adding some extra profiling into the runtime to see what it's doing...

megajogos commented 2 years ago

@ajwfrost we published our apps with the Adobe Air 33.1.1.889 and our ANRs decreased a lot. It is still very bad, but it is an improvement.

Our ANR rate was 4.63% with Adobe Air 33.1.1.856 and now is 2.02% with Adobe Air 33.1.1.889.

image

Please continue working on this issue!

megajogos commented 2 years ago

@ajwfrost do you know what you did in version 33.1.1.889 that caused this impact on the rate of ANRs?

ajwfrost commented 2 years ago

@megajogos between builds 856 and 889 there were a couple of targeted bug-fixes in the Android runtime, but I wouldn't expect them to fix any ANR reports. Which means it could be the compilation step (did you fully rebuild using build 889, vs just re-package the AAB/APK files from the SWFs?) - there were two changes: a) omitting trace statements from the code (by default) in release builds ... not sure if you had a lot of complex trace statements going on at start-up? or maybe an ANE had one? b) merging ABC blocks in SWC libraries ... this change actually caused some problems with library dependencies so we've had to revert it in the latest release; performance analysis also showed that this only had a minimal affect..

Interestingly, looking at your AAB file from above, the main SWF file there is really large (26MB) but doesn't scan properly via SWF Investigator, which thinks it's got thousands of font definitions and no code... so I can't actually see whether there's anything that could be causing a problem there; there are a lot of ANEs but none of these seem to have an excessive number of ABC blocks so I doubt this was the cause of the improvement.

So it's a curious one!..

thanks