airsdk / Adobe-Runtime-Support

Report, track and discuss issues in Adobe AIR. Monitored by Adobe - and HARMAN - and maintained by the AIR community.
206 stars 11 forks source link

Android ANR caused by graphics conflict with Chromium? #3549

Open ajwfrost opened 2 weeks ago

ajwfrost commented 2 weeks ago

See https://github.com/distriqt/ANE-Adverts/issues/589#issuecomment-2442319673

There's an ANR showing up in the log (attached). Initial thoughts are below:

libc.so.__futex_wait_ex.log


The ANR looks like it's GPU-related:

"main" (tid 1) = Android UI:

  #03  pc 0x0000000000119c13  /system/lib/libhwui.so (android::uirenderer::renderthread::DrawFrameTask::drawFrame+254)
  at android.graphics.HardwareRenderer.nSyncAndDrawFrame (Native method)

"Thread 4" (tid 25) = AIR runtime:

  #11  pc 0x0000000000016233  /system/lib/libEGL.so (android::eglMakeCurrentImpl+330)
  #12  pc 0x0000000000093495  /system/lib/libandroid_runtime.so (android::jni_eglMakeCurrent+160)
  at com.google.android.gles_jni.EGLImpl.eglMakeCurrent (Native method)
  at com.adobe.air.FlashEGL10.MakeGLCurrent (FlashEGL10.java:684)
  at com.adobe.air.customHandler.callTimeoutFunction (Native method)

Which might mean there's a conflict between these two threads ...

Although threre's also a Chromium instance in tid 71 which appears to also be waiting:

  #05  pc 0x0000000000017121  /system/lib/libEGL.so (void* android::eglCreateImageTmpl<int, void* >+252)
  #06  pc 0x0000000000017019  /system/lib/libEGL.so (android::eglCreateImageKHRImpl+20)
  #07  pc 0x00000000035b9e81  /data/app/~~UNG6tJ4eOyhRL2v1opXwVQ==/com.google.android.trichromelibrary_666810030-RO1dahlTOceM5BJYt0TDMw==/base.apk (BuildId: 3663fd185c8564e6952860b602a348e3e6b01aa5)

plus I'm curious what triggered "JavaBridge" tid=73 Native - this is also in a function that perhaps is blocking.

And then I've just scrolled down further and found there is a "RenderThread" (tid 100) for the Chromium instance which then seems to be somehow calling back into the AIR runtime..?!

  #13  pc 0x0000000000edfb45  /data/app/~~UNG6tJ4eOyhRL2v1opXwVQ==/com.google.android.trichromelibrary_666810030-RO1dahlTOceM5BJYt0TDMw==/base.apk (BuildId: 3663fd185c8564e6952860b602a348e3e6b01aa5)
  #14  pc 0x00000000000025ed  /system/lib/libwebviewchromium_plat_support.so (android::::draw_gl+284)
  #15  pc 0x0000000000130cdd  /system/lib/libhwui.so (android::uirenderer::WebViewFunctor::drawGl+120)
  #16  pc 0x000000000010f135  /system/lib/libhwui.so (android::uirenderer::skiapipeline::GLFunctorDrawable::onDraw+1636)
  #17  pc 0x0000000000182227  /system/lib/libhwui.so (SkDrawable::draw+58)

@hadisn are you able to reproduce the ANR yourselves? Do you know if there's anything in particular that's causing the problem e.g. a particular screen when you then tap in a particular place? I think we'll need to look more into this one - what version of the AIR SDK were you using for this?

thanks

hadisn commented 2 weeks ago

Hi @ajwfrost I was not able to reproduce it and can't precise what causing the problem. I can tell you that it happen in GPU and Direct rendermode (not sure about cpu). I am using AIR 51.1.2.1 and distriqt Adverts ANE v15.3.0. It looks like it happens only on Android 14 (I will try to test on android 14 more).

Here is logcat from Android 14 test device (Pixel 8 pro): Logcat.txt

Problem described in google play console: "The main thread is blocked, waiting for the rendering subsystem or the GPU to complete a requested operation. This is usually caused by the slowness of the rendering subsystem, the GPU, or its driver."

Also they point to this link: https://developer.android.com/topic/performance/anrs/find-unresponsive-thread#lock-contention

Regards

jigtrap commented 2 weeks ago

Hi @ajwfrost , @hadisn

My findings are similar to what Hadisn is reporting

Reporting ANRs in my app

Rendermode=direct RuntinmeInBackground=true AIR SDK: 51.1.2.1 Adverts v15.3.0)

Findings: Filtering all Android versions but Android 14, flutex_wait ANR appears with affected sessions % : 1.4% Filtering only Android 14, flutex_wait ANR appears with affected sessions % : 29.4%

As we can see there is big difference

Hope it can be solved soon .

Thanks in advance ALdo

ajwfrost commented 1 week ago

Hi

Quick update here is that we looked again at the stack dumps and where the different threads are, and the issue does seem to be related to the GPU and its usage across different theads. The main thread is trying to render; meanwhile the AIR thread is trying to do a 'make current', and the Chromium webview seems to have one thread trying to render and another thread waiting on something...

About the only thing we can control here is to merge the main/UI thread with the runtime thread i.e. remove that "runtimeInBackgroundThread" setting. Would it be possible to try that, and see if it impacts the ANR rate? If you're still getting ANRs with that, it would be good if we can get another dump file with the thread stacks to see whereabouts things are hanging. Given it seems to be related to the Android version (14) per the above, I'm wondering if there's a change in the Android WebView component that could be behind this..

thanks

Andrew
bobrokrol commented 1 week ago

Screenshot_20241107_160415_Chrome

So I also have a spike in this ANRs mostly, for MT6855 and MT6765

stacktrace.log.txt Im using StageWebView in the app.

I have blocked certain devices from google play to avoid breaking bad thresold.

I dont like disbling RuntinmeInBackground=true as this parameter works more stable than before. previously it led to a lot of crashes and ANRs. Right now ot works fine unless this bunch of ANRs on certain chips there is a huge difference in "Excessive slow frames" with disabled / enabled option: 14% vs 3% ( that is close to peer median)

hadisn commented 6 days ago

Hi @ajwfrost, thank you for update. I agree with @bobrokrol RuntinmeInBackground=true solved a lot of problems except these few and I believe that this will be also fixed.

hadisn commented 3 days ago

Hi

Quick update here is that we looked again at the stack dumps and where the different threads are, and the issue does seem to be related to the GPU and its usage across different theads. The main thread is trying to render; meanwhile the AIR thread is trying to do a 'make current', and the Chromium webview seems to have one thread trying to render and another thread waiting on something...

About the only thing we can control here is to merge the main/UI thread with the runtime thread i.e. remove that "runtimeInBackgroundThread" setting. Would it be possible to try that, and see if it impacts the ANR rate? If you're still getting ANRs with that, it would be good if we can get another dump file with the thread stacks to see whereabouts things are hanging. Given it seems to be related to the Android version (14) per the above, I'm wondering if there's a change in the Android WebView component that could be behind this..

thanks

Andrew

Hi @ajwfrost, I see that AIR 51.1.2.2 is released, can you just tell us should we enable or disable runtimeInBackgroundThread with latest SDK version. In release notes I can see that you maybe solved problem releated to this but in your comment here you suggesting to remove runtimeInBackgroundThread so I am not sure what to do :)

Thank you

ajwfrost commented 3 days ago

Hi

The updates in 51.1.2.2 were around some of the other API calls that seemed to result in crashes when using the background thread model - i.e. an actual crash due to state error, rather than just a hang / ANR like you're seeing here.

So I don't expect this version to change anything regarding the conflict we have here with Chromium. My hope had been that it would be possible to see whether switching back to using the UI thread for AIR would then show whether we have a fundamental problem with Chromium interactions, or whether that was just a side-effect of having the extra thread. But if it causes increased ANRs in other areas without the background mode, then it might be tricky (or counter-productive) to check this.

So currently, we're at the same position: we seem to have an odd conflict when using Chromium (but only on certain chipsets?) and we don't know whether or not it's related to the background runtime mode.

thanks

hadisn commented 3 days ago

Hi

The updates in 51.1.2.2 were around some of the other API calls that seemed to result in crashes when using the background thread model - i.e. an actual crash due to state error, rather than just a hang / ANR like you're seeing here.

So I don't expect this version to change anything regarding the conflict we have here with Chromium. My hope had been that it would be possible to see whether switching back to using the UI thread for AIR would then show whether we have a fundamental problem with Chromium interactions, or whether that was just a side-effect of having the extra thread. But if it causes increased ANRs in other areas without the background mode, then it might be tricky (or counter-productive) to check this.

So currently, we're at the same position: we seem to have an odd conflict when using Chromium (but only on certain chipsets?) and we don't know whether or not it's related to the background runtime mode.

thanks

I will upload version with runtimeInBackgroundThread disabled on google play and let you know what will happen.

hadisn commented 1 day ago

Hi @ajwfrost, two days ago I uploaded version with runtimeInBackgroundThread disabled and already see a lot of anrs on Android 14:

"The main thread is blocked, waiting on a native synchronization routine, such as a mutex."

com.google.android.gles_jni.EGLImpl.eglMakeCurrent.log

ajwfrost commented 1 day ago

Okay thanks -- and it still looks like we have the same problem:

Some interesting information about how Chromium works on Android: https://docs.google.com/document/d/1MLPEmMugdVvfeMeQQN_NMolqs4zZekfKjZeNAQJJnMo/edit?usp=sharing So it sounds like they will always have a single GPU thread, and a separate Render thread, but we appear to be having two separate EGL contexts - one from AIR and one from Chromium WebView - both within the same activity.

This might be the issue: from what I'm reading, it might be that Android needs only a single EGL context for a window/surface. It is (I think) possible to have multiple EGL contexts by having multiple Activities.

So just to check on the use case here:

thanks

hadisn commented 1 day ago
Mintonist commented 19 hours ago

I think different apps may have different cases. I think we can't stop using direct/gpu mode. So about a separate activity for webview - can it be a manifest flag or code param? So everyone may choose and know limitations. By the way what limitations for us with separate webview activity?)