apache / cordova-android

Apache Cordova Android
https://cordova.apache.org/
Apache License 2.0
3.59k stars 1.52k forks source link

Crash Report: ConcurrentModificationException #924

Closed EinfachHans closed 3 years ago

EinfachHans commented 4 years ago

Bug Report

Problem

We mentioned some Crash Reports from our App, build with cordova-android@8.1.0. Crashlog:

java.util.ConcurrentModificationException: java.util.ConcurrentModificationException

java.util.LinkedHashMap$LinkedHashIterator.nextNode LinkedHashMap.java:775
java.util.LinkedHashMap$LinkedValueIterator.next LinkedHashMap.java:803
org.apache.cordova.PluginManager.onResume PluginManager.java:262
org.apache.cordova.CordovaWebViewImpl.handleResume CordovaWebViewImpl.java:452
org.apache.cordova.CordovaActivity.onResume CordovaActivity.java:277
android.app.Instrumentation.callActivityOnResume Instrumentation.java:1465
android.app.Activity.performResume Activity.java:8203
android.app.ActivityThread.performResumeActivity ActivityThread.java:4757
android.app.ActivityThread.handleResumeActivity ActivityThread.java:4810
android.app.servertransaction.ResumeActivityItem.execute ResumeActivityItem.java:52
android.app.servertransaction.TransactionExecutor.executeLifecycleState TransactionExecutor.java:190
android.app.servertransaction.TransactionExecutor.execute TransactionExecutor.java:105
android.app.ActivityThread$H.handleMessage ActivityThread.java:2373
android.os.Handler.dispatchMessage Handler.java:107
android.os.Looper.loop Looper.java:213
android.app.ActivityThread.main ActivityThread.java:8147
java.lang.reflect.Method.invoke Method.java
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run RuntimeInit.java:513
com.android.internal.os.ZygoteInit.main ZygoteInit.java:1101

and

java.util.ConcurrentModificationException: java.util.ConcurrentModificationException

java.util.LinkedHashMap$LinkedHashIterator.nextNode LinkedHashMap.java:760
java.util.LinkedHashMap$LinkedValueIterator.next LinkedHashMap.java:788
org.apache.cordova.PluginManager.onPause PluginManager.java:209
org.apache.cordova.CordovaWebViewImpl.handlePause CordovaWebViewImpl.java:435
org.apache.cordova.CordovaActivity.onPause CordovaActivity.java:245
android.app.Activity.performPause Activity.java:7663
android.app.Instrumentation.callActivityOnPause Instrumentation.java:1536
android.app.ActivityThread.performPauseActivityIfNeeded ActivityThread.java:4726
android.app.ActivityThread.performPauseActivity ActivityThread.java:4691
android.app.ActivityThread.handlePauseActivity ActivityThread.java:4626
android.app.servertransaction.PauseActivityItem.execute PauseActivityItem.java:45
android.app.servertransaction.TransactionExecutor.executeLifecycleState TransactionExecutor.java:145
android.app.servertransaction.TransactionExecutor.execute TransactionExecutor.java:70
android.app.ActivityThread$H.handleMessage ActivityThread.java:2199
android.os.Handler.dispatchMessage Handler.java:112
android.os.Looper.loop Looper.java:216
android.app.ActivityThread.main ActivityThread.java:7625
java.lang.reflect.Method.invoke Method.java
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run RuntimeInit.java:524
com.android.internal.os.ZygoteInit.main ZygoteInit.java:987

Environment, Platform, Device

From Crashlog this happens to a Mate 20 Pro with Android 9 and a Galaxy Note10+ with Android 10 too.

Maybe related Issue?

Please fix this ๐Ÿ˜Š

Checklist

breautek commented 4 years ago

I believe this is caused because we access these LinkedHashMap objects potentially from different threads and these hash maps are not thread safe.

Note that this implementation is not synchronized. If multiple threads access a linked hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map: Map m = Collections.synchronizedMap(new LinkedHashMap(...));

https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html

However it will be very hard to be confident unless if we can reproduce this reliability. In the above callstacks, it seems to be occurring on the iteration of plugins on responding to a pause or resume event. Unfortunately this isn't necessary the error, it's just tells us that the hash map has changed while we were iterating.

@HansKrywaa can you tell me a little bit of your app? My wild theory is this is related to maybe the webview being unloaded/killed but that would only make sense if your app goes into the background to load up something memory intensive (such as the camera app). Normally under this situation, the webview is suppose to load back up (but the webview environment would be refreshed). I'm thinking there might be a race condition there...

EinfachHans commented 4 years ago

Hey @breautek ,

thanks for your explanation.

About our App: It's a App where you can companionship someone, so basically it's a lot about Background Location and Websockets. I also used a Plugin which allows me to run in the Background, with a Method do disable the WebView Optimization: https://github.com/HansKrywaa/cordova-plugin-advanced-background-mode

Can i give you any more Information?

breautek commented 4 years ago

Nah... the content of your app doesn't sound like it fit well with my theory... How often do you see these crashes in your crashlytics?

EinfachHans commented 4 years ago

It happen like 30 Times now ๐Ÿค”

timbru31 commented 4 years ago

We know this issue happens, without a proper way to reproduce it, it is however very difficult to debug, fix and confirm the fix. :(

breautek commented 4 years ago

To be clear he was just simply responding to my question haha...

If it only happened 30 times since you opened the issue, that's approximately 1 occurrence per day... which means the chances of intentionally triggering the bug naturally will likely be extremely difficult.

I have a feeling that there are potential race conditions in the resume/pause lifecycles but we will likely need a controlled test case to prove it...

Or if you're willing... I can branch off of 8.1 and implement synchronizedMap and you can fork my branch. We'll see if the problem goes away. If it is a race condition, the java docs appears to suggest that synchronizedMap should fix it by ensuring synchronised access to our plugin maps. I'm assuming this is at a performance cost though and I'm not sure how significant that would be...

EinfachHans commented 4 years ago

The App is a โ€žbig" one, from one of the biggest Insurances from Germany. I'm the developer, but i can't just say: Yes we make it like this, if we don't know how the performance is losing. ๐Ÿค”

But in general this sounds like a good idea, is this fork much work for you?

breautek commented 4 years ago

I don't think it's that much work, I think it's literally just wrapping our Linked Hash Maps inside a synchronizedMap and using the wrapping the map iterations inside a synchronized block.

I'll probably start this tonight, or if you like to tackle this yourself, feel free.

EinfachHans commented 4 years ago

I currently won't have the time ๐Ÿ˜• Please message me if you are ready, so i can try it out and maybe use it in the next Release. Will we be able to test performance somehow?

breautek commented 4 years ago

I think all iterations of the plugin map needs to be synchronized, so logically I think I believe if you have a large number of plugins, you'll see a bigger impact. I'll test by installing all the apache plugins + maybe a few more and try to compare apps on cordova-android@8.1 vs the fork.

This is all just speculation, so we'll see just how big of the impact is. The impact could be insignificant altogether as well, for all we know...

breautek commented 4 years ago

@HansKrywaa I have a branch at https://github.com/breautek/cordova-android/commits/concurrent-plugin-manager-8.1

I would advise you to fork cordova-android and merge my branch into your own fork (that way you have complete control)

Then you can install the platform via

cordova platform add https://github./com/yourgitrepo.git#your-branch

Just like any cordova-android update, you'll need to remove the platform first by doing: cordova platform remove android

You can look at the change set at https://github.com/breautek/cordova-android/commit/bfda452a09ba26412e0f527f6af1957379d89453

Upon further reading, it looks like I didn't need to wrap everything inside a synchronized block, unless if you use the iterators, which the codebase does not. So using Collections.synchronizedMap should be enough, but again, hard to say for certain when we lack a reproducible test case.

I saw no significant performance impact with my test app, but obviously my test app wasn't a real production app. It was just the cordova hello world app with all of the apache plugins installed. I'll leave it up to you if you see any significant performance impacts on your app, which would represent a real-world scenario.

Given how frequent the crash appears to occur... with any luck, you won't see the crash for the next 30 days or so...? :crossed_fingers:

Please do keep me updated.

EinfachHans commented 4 years ago

Thanks for your work, i will do like you told. I'm not sure when we will build the next Release Version, but i will use the Fork in this and ping you here for your Information! ๐Ÿ˜Š

EinfachHans commented 4 years ago

So just to let you know: We release a new Version with the Fork included tomorrow ๐ŸŽ‰

Currently we had two Versions released before. Within these we had for now over 35 Crash Reports (nearly daily).

EinfachHans commented 4 years ago

Just got this Crash: java.util.ConcurrentModificationException


java.util.LinkedHashMap$LinkedHashIterator.nextNode LinkedHashMap.java:760
java.util.LinkedHashMap$LinkedValueIterator.next LinkedHashMap.java:788
org.apache.cordova.PluginManager.postMessage PluginManager.java:314
org.apache.cordova.CordovaWebViewImpl$EngineClient$1$1.run CordovaWebViewImpl.java:558
android.os.Handler.handleCallback Handler.java:907
android.os.Handler.dispatchMessage Handler.java:105
android.os.Looper.loop Looper.java:216
android.app.ActivityThread.main ActivityThread.java:7625
java.lang.reflect.Method.invoke Method.java
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run RuntimeInit.java:524
com.android.internal.os.ZygoteInit.main ZygoteInit.java:987

I think it is the same?

breautek commented 4 years ago

Hmm, looks like it, just in a different area. But it's still inside a loop like the original 2 stacktraces. Using synchronizedMap had a note that if using Iterator, then we need to wrap it in a synchronized block.

We don't use iterators directly, we use a for in block... so I didn't think we needed the synchronized block.... but I guess behind-the-scenes it still does use java.util.LinkedHashMap$LinkedValueIterator so maybe we do need to wrap every for...in block inside a synchronized {}...

And I trust that your new stacktrace crash came from an updated version of the app that contains the https://github.com/breautek/cordova-android/commit/bfda452a09ba26412e0f527f6af1957379d89453 patch?

EinfachHans commented 4 years ago

Like you said, i forked your branch (See my Fork) and use this as the android platform. It is definitely used

breautek commented 4 years ago

It's important to determine that the crash came from someone who has actually updated their app to the version that includes your fork.

If the crash came from someone who was using the older version that we already know was broken because they simply have not updated then the stack trace doesn't tell us anything new.

EinfachHans commented 4 years ago

Yeah i can see that the User with the Crash is using Version 2.0.3 which is the one we build today

EinfachHans commented 4 years ago

oh sorry sorry sorry my bad. It was Version 2.0.2, i was confused because like we have 30-40 other crashes right now from other stuff... My bad i'm sorry

breautek commented 4 years ago

Ah ok, and does 2.0.2 includes your fork or was that only added in 2.0.3?

EinfachHans commented 4 years ago

no the Fork only was includes since 2.0.3, so we are running good at this front at the moment ๐Ÿ˜‚ Sorry for confusing

breautek commented 4 years ago

No problem :smile: so as far as we know, the synchronizedMap wrapper is (hopefully) working. If you see a stack trace like that comes from 2.0.3+ then we still have a problem.

EinfachHans commented 4 years ago

Ok but know really. I got the same Crash for the Version with the Fork... ๐Ÿ˜• This Time really^^

Huawei P30 Lite (Android 9)

breautek commented 4 years ago

Ok... well that means I think every loop block needs to be inside a synchronized block... but honestly I haven't found any resources that says that is necessary, as every example deals with using the Iterator class... so this is just another assumption... The PluginManager has a lot of methods that does iterations over it's two maps.

I'm not sure why for...in loops was chosen, I assumed for simplicity sake...

EinfachHans commented 4 years ago

But the "default" Crash didn't happen so far. So how should we continue?

EinfachHans commented 4 years ago

Hey again,

any Updates on this?

breautek commented 3 years ago

Sorry, I got busy.

But the "default" Crash didn't happen so far. So how should we continue?

What do you mean by the "default" crash?

EinfachHans commented 3 years ago

I mean the Crash the Issue was about. So at java.util.LinkedHashMap$LinkedHashIterator.nextNode LinkedHashMap.java:775

But that is not true, this Crash still happens.

breautek commented 3 years ago

Ok, I'm going to have to add the Info needed tag. I'm not sure when I'll be able to get back to this but we will going to need to find a way to reliably reproduce the crash in a sample app.

I think it's obvious that this issue is caused by a race condition so that it is going to be very difficult to reproduce this in a simple sample app, so the sample app may need to have modified framework code that forces the app to reliably trigger java.util.ConcurrentModificationException.

EinfachHans commented 3 years ago

@breautek Did you see the PR - will this fix this? We still have many crashes with this

ebhsgit commented 3 years ago

@breautek @EinfachHans I've fixed this issue in my own fork.

The issue is caused by cordova plugin's that use different threads to perform there executions. When they complete their action, some times the iterators are changed, causing the ConcurrentModificationException. Example plugins: Cordova Google Map, Splashscreen, etc...

My initial thought was using Collection.synchronizedMap was sufficient, but the exception still occurs in productions.

Only by wrapping all the for loops inside synchronized keyword fixes the issue.

Note the downside of doing this is that your app load may be abit slow as now the plugins different threads need to wait for the threads of another plugin.

The 2 changesets that fix this problem. Feel free to use as you see fit.

https://github.com/8bhsolutions/cordova-android/commit/9d322237108dfdff391f67a9c7c78f9279866417 https://github.com/8bhsolutions/cordova-android/commit/1841789e1afcba592a20ccf03ddf33149426527f

breautek commented 3 years ago

My initial thought was using Collection.synchronizedMap was sufficient, but the exception still occurs in productions.

Are you saying the PR https://github.com/apache/cordova-android/pull/1073 did not address the issue for you?

Note the downside of doing this is that your app load may be abit slow as now the plugins different threads need to wait for the threads of another plugin.

Personally I don't see a way around that if unrelated plugin code is causing conflicts. My motto is make the framework work, then find a way to optimise later if necessary. So I think that's okay for now.

The 2 changesets that fix this problem. Feel free to use as you see fit.

A PR would be appreciated. I was never able to reproduce this issue myself, and the combination of plugins in my production apps doesn't appear to trigger this issue so I'm relying on your observations. If you can prepare a branch with those commits and create a PR, it will make it easy for @EinfachHans to try it on his own apps to see if it resolves the issue for him.

ebhsgit commented 3 years ago

I didn't use PR #1073, so can't comment whether it works or not. On my first attempt to fix the issue, I only changed the collection to Collection.synchronizedMap, and did not change the for loop to foreach

I can confirm that just changing to using the synchronizedMap does not fix the issue.

I will add that even if PR #1073 works, it will not fix all ConcurrentModificationException, because CME occurs on other methods as well. It's just that PostMessage has the highest occurrence because splashscreen plugin invokes it during app load, and that is when all other plugins are also loading at the same time.

ebhsgit commented 3 years ago

Not sure why the PR #1091 failed macOs tests....

breautek commented 3 years ago

I can confirm that just changing to using the synchronizedMap does not fix the issue.

Yah, I provided a test fix for @EinfachHans awhile back that simply changed the lists to a synchronizedMap since it sounds like that was all that was required to be thread-safe according the docs, but EinfachHans reported the same result.

I will add that even if PR #1073 works, it will not fix all ConcurrentModificationException, because CME occurs on other methods as well.

Glancing over the PR, it looks like you've updated all methods to be synchronized?

Not sure why the PR #1091 failed macOs tests....

I don't think it's the fault of the PR, it appears to be a timeout issue, I can try restarting the tests.

ebhsgit commented 3 years ago

Glancing over the PR, it looks like you've updated all methods to be synchronized?

Yes, because in my app, I had CME occurring on the other functions too.

EinfachHans commented 3 years ago

So whenever you say, i can use a Fork in the next release of our app. Because this is currently only supported in production (often) and then check for some month/weeks if the crash appears.

But because of this and because our app is very "important" (emergenc app in cooperation with german police) and we have many users the fork should be really safe to use ๐Ÿ˜ฌ