NordicSemiconductor / Android-nRF-Mesh-Library

The Bluetooth Mesh Provisioner and Configurator library.
https://www.nordicsemi.com/
BSD 3-Clause "New" or "Revised" License
414 stars 177 forks source link

Potential ConcurrentException in MeshNetworkDb:InsertNetworkAsyncTask() #432

Closed Mavericksb closed 2 years ago

Mavericksb commented 3 years ago

Hello, I'd like to report a second bug found during importing a MeshNetwork (The first one and solution has been post under BubblyNetDev post named "Migration issue in 3.1.5", not sure if it has been read and is planned to be fixed in next releases).

By the way, In class MeshNetworkDb.Java

Problem :
In class :

private static class InsertNetworkAsyncTask extends AsyncTask<Void, Void, Void> {
    [...]
    if (!meshNetwork.nodes.isEmpty()) {
        nodesDao.insert(meshNetwork.nodes);
    }

    if (meshNetwork.groups != null) {
        groupsDao.insert(meshNetwork.groups);
    }

    if (meshNetwork.scenes != null) {
        scenesDao.insert(meshNetwork.scenes);
    }
    [...]
}

This can cause troubles down the line, in our case it caused several (still rare) ConcurrentExceptions :

 Fatal Exception: java.lang.RuntimeException: An error occurred while executing doInBackground()
        at android.os.AsyncTask$4.done(AsyncTask.java:415)
      ...
 Caused by java.util.ConcurrentModificationException
        at java.util.ArrayList$Itr.next(ArrayList.java:860)
        at androidx.room.EntityInsertionAdapter.insert(EntityInsertionAdapter.java:95)
        at no.nordicsemi.android.meshprovisioner.data.ProvisionedMeshNodesDao_Impl.insert(ProvisionedMeshNodesDao_Impl.java:303)
        at no.nordicsemi.android.meshprovisioner.MeshNetworkDb$InsertNetworkAsyncTask.doInBackground(MeshNetworkDb.java:307)
        at no.nordicsemi.android.meshprovisioner.MeshNetworkDb$InsertNetworkAsyncTask.doInBackground(MeshNetworkDb.java:271)

Cause : meshNetwork.nodes meshNetwork.groups and meshNetwork.scenes are arrayList. They can be modified from outside the MeshNetworkDb.java for any reason, it shouldn't happen, but if it happens this leads to ConcurrentException.

Solution I fixed the problem cloning the arrayList before passing it to the Dao, this makes sure we never pass an array which could be potentially modified from outside:

    if (!meshNetwork.nodes.isEmpty()) {
        ArrayList<ProvisionedMeshNode> copyNodes = new ArrayList<>(meshNetwork.nodes);
        nodesDao.insert(copyNodes);
    }
roshanrajaratnam commented 3 years ago

Hi, thanks for reporting this. Asynctask has been deprecated in the library. However like you mentioned the migration may cause an issue. I will take a look.

saty9 commented 3 years ago

We are hitting this repeatedly when making multiple changes to the network in quick succession

FluffyBunniesTasteTheBest commented 2 years ago

Any update on this issue? A few of our users are facing something quite similar:

java.util.ConcurrentModificationException: 
  at java.util.ArrayList$Itr.next (ArrayList.java:860)
  at androidx.room.EntityInsertionAdapter.insert (EntityInsertionAdapter.java:95)
  at no.nordicsemi.android.mesh.data.ProvisionersDao_Impl.insert (ProvisionersDao_Impl.java:167)
  at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$insertNetwork$0 (MeshNetworkDb.java:149)
  at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$cvDKr66SNZXIthGwE9B7B-3EKhs.run (Unknown Source:16)
Mavericksb commented 2 years ago

Any update on this issue? A few of our users are facing something quite similar: at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$insertNetwork$0 (MeshNetworkDb.java:149)

What is the line of code at MeshNetworkDb.java:149 ? If there is an array passed to an insert() method, just copy the array first, and then pass the copy to the insert method instead of the original array, to avoid concurrency.

roshanrajaratnam commented 2 years ago

@FluffyBunniesTasteTheBest sorry, been a bit busy with some other project and I just got back from my paternity leave. I will start looking in to these one by one!

FluffyBunniesTasteTheBest commented 2 years ago

I just got back from my paternity leave.

Wow, nice! Congratulations!! I hope the little one is doing well.

Beside of just copying the ArrayList, I've also delayed the call that's causing the ConcurrentModificationException.

Google Play, btw., also logged some related SQLiteConstraintExceptions:

android.database.sqlite.SQLiteConstraintException: 
  at android.database.sqlite.SQLiteConnection.nativeExecuteForLastInsertedRowId (Native Method)
  at android.database.sqlite.SQLiteConnection.executeForLastInsertedRowId (SQLiteConnection.java:938)
  at android.database.sqlite.SQLiteSession.executeForLastInsertedRowId (SQLiteSession.java:790)
  at android.database.sqlite.SQLiteStatement.executeInsert (SQLiteStatement.java:88)
  at androidx.sqlite.db.framework.FrameworkSQLiteStatement.executeInsert (FrameworkSQLiteStatement.java:51)
  at androidx.room.EntityInsertionAdapter.insert (EntityInsertionAdapter.java:64)
  at no.nordicsemi.android.mesh.data.ProvisionerDao_Impl.insert (ProvisionerDao_Impl.java:184)
  at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$insert$14 (MeshNetworkDb.java:260)
  at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$UgSLCp-hJGqhdS21yNHcIzr_7_Q.run (Unknown Source:4)

Problem with those exceptions is that neither of them is reproducible with any of my test devices, but they do effect about a 5% of our users, which lead to quite some 1 star reviews on Google Play. Speaking of which, it would be very useful to have an occasional bugfix release every now and then, so we don't have to cherry-pick though all the commits and build the library our self, which would be quite a time safer. Anyway, thanks for your help and all the best to you and your family.

Mavericksb commented 2 years ago

Beside of just copying the ArrayList, I've also delayed the call that's causing the ConcurrentModificationException.

Ok, BTW adding a delay shouldn't be necessary.

Google Play, btw., also logged some related SQLiteConstraintExceptions:

android.database.sqlite.SQLiteConstraintException: ...

SQLiteConstrainException isn't related to the previous problem, it means that you are passing an Object (or list of Objects) that violate the constraints that are defined in Table constructor. As far as I can tell you are trying to insert a Provisioner Node. It's hard to help without any more detail on the Table you are trying to write on, its definition, and the objects you are passing. One of the most common causes (but it is just an example) could be that the Provisioner Table defined in MeshNetworkDb class needs an UNIQUE Key, and you are trying to insert an object (A Provisioner Node in this case) with an UNIQUE ID which already exists in the Table.

You need to do some debugging to find out where the problem lies. I suspect there is some circumstance where you don't clear the Provisioner table before inserting a new Provisioner Node or you don't change the Unique ID for the new one. Honestly I didn't face it on our App, sorry I can't help more than this.

FluffyBunniesTasteTheBest commented 2 years ago

@Mavericksb Thank you very much for the clarification!

As far as I can tell you are trying to insert a Provisioner Node.

Yes. During startup, the app creates a bunch of provisioners, to enable multiple app instances to operate the mesh.

It's hard to help without any more detail on the Table you are trying to write on, its definition, and the objects you are passing.

That's all handled by the library, but...

SQLiteConstrainException isn't related to the previous problem, it means that you are passing an Object (or list of Objects) that violate the constraints that are defined in Table constructor. ... One of the most common causes (but it is just an example) could be that the Provisioner Table defined in MeshNetworkDb class needs an UNIQUE Key, and you are trying to insert an object (A Provisioner Node in this case) with an UNIQUE ID which already exists in the Table.

... that perfectly explains the cause of the SQLiteConstraintException:

If the app crashes because of a ConcurrentException, it tries to re-create the provisioners again during the next start, which, the way it looks, then causes SQLiteConstrainException because some of the provisioners already exist.

Thank you very much for sharing this! You've just saved me from quite some sleepless nights...

FluffyBunniesTasteTheBest commented 2 years ago

Just for the records, there's one more situation where the ConcurrentModificationException can occur and crash the app:

java.util.ConcurrentModificationException: 
  at java.util.ArrayList$Itr.next (ArrayList.java:860)
  at androidx.room.EntityDeletionOrUpdateAdapter.handleMultiple (EntityDeletionOrUpdateAdapter.java:86)
  at no.nordicsemi.android.mesh.data.ProvisionerDao_Impl.update (ProvisionerDao_Impl.java:232)
  at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$16 (MeshNetworkDb.java:276)
  at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$kyOqCg3Q_F167aHhj0mXWofVRzM.run (Unknown Source:4)
  at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1167)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:641)
  at java.lang.Thread.run (Thread.java:920)
roshanrajaratnam commented 2 years ago

@FluffyBunniesTasteTheBest are you facing this issue on the latest version as well?

ghost commented 2 years ago

To answer your question @roshanrajaratnam, yes it occurs on latest version (3.1.6). Actually we have been facing ConcurrentModficationException crashes pretty much from the beginning of our usage of this library (smth like v2.4 was the first version we used)

The places where such crash can happen :

We could not find time to try to fix this issue as we mostly see it only when using ConfigNodeReset, which is not a feature our end users use a lot. But we have been collecting stack traces for future work. I may post them here if you want to see what are our crash occurrences ?

roshanrajaratnam commented 2 years ago

I will use the approach that @Mavericksb has suggested since it's a working solution. All updates and inserts will be done after copying the data in to a new list.

roshanrajaratnam commented 2 years ago

So while debugging I found that the reason for this to happen only in the above mentioned Dao objects is that, an unmodifiable list not being passed for the insert operation.

FluffyBunniesTasteTheBest commented 2 years ago

@roshanrajaratnam yes, happens with 3.1.6.

ghost commented 2 years ago

Hello @roshanrajaratnam ! So I've pulled the new release to test the fixes, and it seems better, thanks for the update šŸ„³ Though the only place I flagged as "maybe our fault" is the one I can still get lol.. Of course you didn't apply the fix to the network exclusions list (by my fault?). But actually, I think it needs the fix too. (here)

V/MeshManagerApi(13645): Received network pdu: 0x0076CBD44F998565E7172186B5E3E4DC7C33FB2F5C
V/BaseMeshMessageHandler(13645): Sequence number of received Network PDU: 23
V/NetworkLayer(13645): TTL for received message: 127
V/NetworkLayer(13645): Dst: 0x499C
D/LowerTransportLayer(13645): IV Index of received message: 0
D/LowerTransportLayer(13645): SeqAuth: 23
V/AccessLayer(13645): Received Access PDU 804A
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(13645): onNetworkUpdated
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(13645): onNetworkUpdated
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(13645): onNetworkUpdated
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(13645): onNetworkUpdated
D/DoozMeshStatusCallbacks(13645): onMeshMessageReceived
D/DoozMeshStatusCallbacks(13645): received a ConfigNodeResetStatus
E/AndroidRuntime(13645): FATAL EXCEPTION: pool-21-thread-1
E/AndroidRuntime(13645): Process: fr.dooz.app, PID: 13645
E/AndroidRuntime(13645): java.util.ConcurrentModificationException
E/AndroidRuntime(13645):    at java.util.ArrayList$Itr.next(ArrayList.java:860)
E/AndroidRuntime(13645):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:96)
E/AndroidRuntime(13645):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
E/AndroidRuntime(13645):    at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
E/AndroidRuntime(13645):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:208)
E/AndroidRuntime(13645):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:145)
E/AndroidRuntime(13645):    at com.google.gson.Gson.toJson(Gson.java:735)
E/AndroidRuntime(13645):    at com.google.gson.Gson.toJson(Gson.java:714)
E/AndroidRuntime(13645):    at com.google.gson.Gson.toJson(Gson.java:669)
E/AndroidRuntime(13645):    at com.google.gson.Gson.toJson(Gson.java:649)
E/AndroidRuntime(13645):    at no.nordicsemi.android.mesh.MeshTypeConverters.networkExclusionsToJson(MeshTypeConverters.java:157)
E/AndroidRuntime(13645):    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$3(MeshNetworkDb.java:200)
E/AndroidRuntime(13645):    at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$lIm-UMtxdFiQaNxV0ONcZDAI6eo.run(Unknown Source:4)
E/AndroidRuntime(13645):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
E/AndroidRuntime(13645):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
E/AndroidRuntime(13645):    at java.lang.Thread.run(Thread.java:764)
E/AndroidRuntime(13645): FATAL EXCEPTION: pool-21-thread-2
E/AndroidRuntime(13645): Process: fr.dooz.app, PID: 13645
E/AndroidRuntime(13645): java.util.ConcurrentModificationException
E/AndroidRuntime(13645):    at java.util.ArrayList$Itr.next(ArrayList.java:860)
E/AndroidRuntime(13645):    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1084)
E/AndroidRuntime(13645):    at androidx.room.EntityDeletionOrUpdateAdapter.handleMultiple(EntityDeletionOrUpdateAdapter.java:86)
E/AndroidRuntime(13645):    at no.nordicsemi.android.mesh.data.ProvisionedMeshNodesDao_Impl.update(ProvisionedMeshNodesDao_Impl.java:305)
E/AndroidRuntime(13645):    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$6(MeshNetworkDb.java:227)
E/AndroidRuntime(13645):    at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$P_OWMUQ0TcsIRsw0eLJvpw-4dpE.run(Unknown Source:16)
E/AndroidRuntime(13645):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
E/AndroidRuntime(13645):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
E/AndroidRuntime(13645):    at java.lang.Thread.run(Thread.java:764)

PS: I'm not getting it every time so it seems to be a sporadic issue šŸ¤” Now, most of the time I can receive ConfigNodeResetStatus without any issue, so again : thanks for the update šŸ‘

roshanrajaratnam commented 2 years ago

Thanks for your feedback. You are correct I did forget to include network exclusions in the latest release.

roshanrajaratnam commented 2 years ago

Could you try this please?

ghost commented 2 years ago

I just had it again :/ Will try more times again to see

roshanrajaratnam commented 2 years ago

ok, I haven't released yet. Please try it and let me know thanks.

ghost commented 2 years ago

This time, I had it when force deleting node (ie. not sending ConfigNodeReset)

D/DoozMeshNetwork(19440): should delete the nodeId : 18960
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(19440): onNetworkUpdated
E/AndroidRuntime(19440): FATAL EXCEPTION: pool-21-thread-1
E/AndroidRuntime(19440): Process: fr.dooz.app, PID: 19440
E/AndroidRuntime(19440): java.util.ConcurrentModificationException
E/AndroidRuntime(19440):    at java.util.ArrayList$Itr.next(ArrayList.java:860)
E/AndroidRuntime(19440):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:96)
E/AndroidRuntime(19440):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
E/AndroidRuntime(19440):    at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
E/AndroidRuntime(19440):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:208)
E/AndroidRuntime(19440):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:145)
E/AndroidRuntime(19440):    at com.google.gson.Gson.toJson(Gson.java:735)
E/AndroidRuntime(19440):    at com.google.gson.Gson.toJson(Gson.java:714)
E/AndroidRuntime(19440):    at com.google.gson.Gson.toJson(Gson.java:669)
E/AndroidRuntime(19440):    at com.google.gson.Gson.toJson(Gson.java:649)
E/AndroidRuntime(19440):    at no.nordicsemi.android.mesh.MeshTypeConverters.networkExclusionsToJson(MeshTypeConverters.java:157)
E/AndroidRuntime(19440):    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$3(MeshNetworkDb.java:200)
E/AndroidRuntime(19440):    at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$lIm-UMtxdFiQaNxV0ONcZDAI6eo.run(Unknown Source:4)
E/AndroidRuntime(19440):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
E/AndroidRuntime(19440):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
E/AndroidRuntime(19440):    at java.lang.Thread.run(Thread.java:764)

my exclusion list is pretty long (like 100+ addresses), maybe it can impact the async task ?

ghost commented 2 years ago

I gotta move to some others tasks. I will test again tomorrow or next week šŸ‘

roshanrajaratnam commented 2 years ago

But the library has deprecated asynctasks in the DB operations. I will debug more. Thanks for testing.

roshanrajaratnam commented 2 years ago

@R0m4in-dooz could you export and share your network with me? I can test against your large exclusion list!

ghost commented 2 years ago

Sure, there you go. Formatted the Network JSON file uploaded as a text file due to github limitations.

network.txt

ghost commented 2 years ago

well, bad news : I had crash upon node reset status and it didn't occur on network exclusions handling

V/MeshManagerApi(22342): Received network pdu: 0x00762B15ED7007453044CDCCD444D691BA06F89C5D
V/BaseMeshMessageHandler(22342): Sequence number of received Network PDU: 114741
V/NetworkLayer(22342): TTL for received message: 127
V/NetworkLayer(22342): Dst: 0x61F3
D/LowerTransportLayer(22342): IV Index of received message: 0
D/LowerTransportLayer(22342): SeqAuth: 114741
V/AccessLayer(22342): Received Access PDU 804A
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(22342): onNetworkUpdated
I/chatty  (22342): uid=10311(u0_a311) fr.dooz.app identical 2 lines
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(22342): onNetworkUpdated
D/DoozMeshStatusCallbacks(22342): onMeshMessageReceived
D/DoozMeshStatusCallbacks(22342): received a ConfigNodeResetStatus
E/AndroidRuntime(22342): FATAL EXCEPTION: pool-21-thread-2
E/AndroidRuntime(22342): Process: fr.dooz.app, PID: 22342
E/AndroidRuntime(22342): java.util.ConcurrentModificationException
E/AndroidRuntime(22342):    at java.util.ArrayList$Itr.next(ArrayList.java:860)
E/AndroidRuntime(22342):    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1084)
E/AndroidRuntime(22342):    at androidx.room.EntityDeletionOrUpdateAdapter.handleMultiple(EntityDeletionOrUpdateAdapter.java:86)
E/AndroidRuntime(22342):    at no.nordicsemi.android.mesh.data.ProvisionedMeshNodesDao_Impl.update(ProvisionedMeshNodesDao_Impl.java:305)
E/AndroidRuntime(22342):    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$6(MeshNetworkDb.java:227)
E/AndroidRuntime(22342):    at no.nordicsemi.android.mesh.-$$Lambda$MeshNetworkDb$P_OWMUQ0TcsIRsw0eLJvpw-4dpE.run(Unknown Source:16)
E/AndroidRuntime(22342):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
E/AndroidRuntime(22342):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
E/AndroidRuntime(22342):    at java.lang.Thread.run(Thread.java:764)

I didn't carefully check my network JSON maybe there is an issue inside, but i don't see why. Plus this crash not only occur with my network as we can see some users are facing it thanks to monitoring tools.. image (this crash event occured on v3.1.6 of this library)

roshanrajaratnam commented 2 years ago

Any node that is reset gets added to the exclusion list. I still haven't had time to test against your json file. I will push a fix, could you test it for me on your network?

FluffyBunniesTasteTheBest commented 2 years ago

Two of our (~500 active) users experience the same crash @R0m4in-dooz reported. Google Play console shows:

java.util.ConcurrentModificationException: 
  at java.util.ArrayList$Itr.next (ArrayList.java:860)
  at java.util.Collections$UnmodifiableCollection$1.next (Collections.java:1089)
  at androidx.room.EntityDeletionOrUpdateAdapter.handleMultiple (EntityDeletionOrUpdateAdapter.java:86)
  at no.nordicsemi.android.mesh.data.ProvisionedMeshNodesDao_Impl.update (ProvisionedMeshNodesDao_Impl.java:317)
  at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$6 (MeshNetworkDb.java:233)
  at no.nordicsemi.android.mesh.MeshNetworkDb$$ExternalSyntheticLambda2.run (Unknown Source:16)
  at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1167)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:641)
  at java.lang.Thread.run (Thread.java:923)

Beside that, Google Play also logged a few SQLiteConstraintException that could be related - both exceptions have a call of ProvisionedMeshNodeDao_Impl.insert() in their stack trace.

android.database.sqlite.SQLiteConstraintException: 
  at android.database.sqlite.SQLiteConnection.nativeExecuteForLastInsertedRowId (Native Method)
  at android.database.sqlite.SQLiteConnection.executeForLastInsertedRowId (SQLiteConnection.java:1217)
  at android.database.sqlite.SQLiteSession.executeForLastInsertedRowId (SQLiteSession.java:790)
  at android.database.sqlite.SQLiteStatement.executeInsert (SQLiteStatement.java:88)
  at androidx.sqlite.db.framework.FrameworkSQLiteStatement.executeInsert (FrameworkSQLiteStatement.java:51)
  at androidx.room.EntityInsertionAdapter.insert (EntityInsertionAdapter.java:64)
  at no.nordicsemi.android.mesh.data.ProvisionedMeshNodeDao_Impl.insert (ProvisionedMeshNodeDao_Impl.java:301)
  at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$insert$19 (MeshNetworkDb.java:315)
  at no.nordicsemi.android.mesh.MeshNetworkDb$$ExternalSyntheticLambda9.run (Unknown Source:4)
  at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1167)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:641)
  at java.lang.Thread.run (Thread.java:923)
roshanrajaratnam commented 2 years ago

@FluffyBunniesTasteTheBest did you test against version 3.1.7? Although I have not had time to test the the reset issue @R0m4in-dooz has faced yet.

FluffyBunniesTasteTheBest commented 2 years ago

@roshanrajaratnam Thanks for your help!

did you test against version 3.1.7?

Not yet, the current app in production is still based on 3.1.6, but I've merged some of the fixes between 3.1.6 and 3.1.7 (one of them was 9bd1a82, so it should behave the same).

Unfortunately I don't even know what triggered the crash. Problem is that privacy is very important to our target audience, so there's no analytics/crash reporting implemented other than what Android provides, which, unfortunately, keep us clueless about how to provoke the issue. Anyway, I'll publish an update with 3.1.7 next week.

FluffyBunniesTasteTheBest commented 2 years ago

@roshanrajaratnam Thanks for your help!

Unfortunately it happens with 3.1.7 as well. This time from line 195 at ProvisionersDao_Impl.java. The previous crash happened at line 317 at ProvisionedMeshNodesDao_Impl.java.

java.util.ConcurrentModificationException
    at java.util.ArrayList$Itr.next(ArrayList.java:860)
    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1089)
    at androidx.room.EntityInsertionAdapter.insert(EntityInsertionAdapter.java:95)
    at no.nordicsemi.android.mesh.data.ProvisionersDao_Impl.insert(ProvisionersDao_Impl.java:159)
    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$insertNetwork$0(MeshNetworkDb.java:151)
    at no.nordicsemi.android.mesh.MeshNetworkDb$$ExternalSyntheticLambda29.run(Unknown Source:16)

A little background info on our app: When the app starts for the first time it creates 15 additional provisioners in rapid succession - which, on some devices, triggeres this ConcurrentModificationException and causes a crash. The app needs those additional provisioners to enable multiple users to operate the mesh network. It's most likely a rather rare use case...

Would be nice if this could be fixed soon.

roshanrajaratnam commented 2 years ago

@FluffyBunniesTasteTheBest will take a look at this issue today and get back to you!

FluffyBunniesTasteTheBest commented 2 years ago

@roshanrajaratnam Thanks for your help!

Similar to the ConcurrentModificationException when adding nodes, the ConcurrentModificationException when adding additional provisioners causes a follow up SQLiteConstraintException:

android.database.sqlite.SQLiteConstraintException: FOREIGN KEY constraint failed (code 787 SQLITE_CONSTRAINT_FOREIGNKEY)
        at android.database.sqlite.SQLiteConnection.nativeExecuteForLastInsertedRowId(Native Method)
        at android.database.sqlite.SQLiteConnection.executeForLastInsertedRowId(SQLiteConnection.java:879)
        at android.database.sqlite.SQLiteSession.executeForLastInsertedRowId(SQLiteSession.java:790)
        at android.database.sqlite.SQLiteStatement.executeInsert(SQLiteStatement.java:88)
        at androidx.sqlite.db.framework.FrameworkSQLiteStatement.executeInsert(FrameworkSQLiteStatement.java:51)
        at androidx.room.EntityInsertionAdapter.insert(EntityInsertionAdapter.java:64)
        at no.nordicsemi.android.mesh.data.ProvisionerDao_Impl.insert(ProvisionerDao_Impl.java:176)
        at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$insert$14(MeshNetworkDb.java:262)
        at no.nordicsemi.android.mesh.MeshNetworkDb$$ExternalSyntheticLambda14.run(Unknown Source:4)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
        at java.lang.Thread.run(Thread.java:919)

Unfortunately all four of those exceptions crash the app before the app can catch them. If a user is effected by it, the only way to fix it is to uninstall and reinstall the app and hope it's not happening again.

I've first tried to fix/workaround it by calling MeshNetwork.getProvisioners(); and adding the new provisioner only if there's no other provisioner with the same unicast start range, but it didn't work - maybe the preceding crash caused some issues in the database? I don't know...

At the end, the only solution that reliably worked was to catch the exception at insert() in MeshNetworkDb.java:

    void insert(@NonNull final ProvisionerDao dao, @NonNull final Provisioner provisioner) {
        databaseWriteExecutor.execute(() -> {
            // Note: The try/catch block below is needed to fix a crash caused by an SQLiteConstraintException reported at Google Play console in app version 1.0.2 b34.
            //       The exception should happen only if the provisioner is already in the database, so it should be OK to just log it an continue...
            try {
                dao.insert(provisioner);
            } catch (Exception e) { 
                Log.e(TAG, "Inserting provisioner failed! "+e.getMessage());
                e.printStackTrace();
            }
        });
    }

This is a pretty hackish fix, but the only thing that worked. What do you think about it? Could you please catch the exception in insert()? It would help a lot. Any better idea is more than welcome!

Again, thanks for your help.

roshanrajaratnam commented 2 years ago

question here though, what do you mean by adding nodes? are you referring to provisioning them in to the network?

Edit: Does this mean provisioning fails in your case? How can I reproduce this issue?

FluffyBunniesTasteTheBest commented 2 years ago

question here though, what do you mean by adding nodes? How can I reproduce this issue?

Reproducing the issue is difficult, but the way it looks I assume that the user imported a mesh network from another device, which appears to be the only situation where multiple nodes are added in MeshNetworkDb.insertNetwork().

Unfortunately the crashes reported by Google Play are the only information available, making this a rather tough one to fix.

Edit: I appears MeshNetworkDb.insertNetwork() can cause the ConcurrentModificationException if any of the provisioners, nodes, groups or scenes to insert change. Btw, @Mavericksb quickfix to simply copy the arrays seems to work quite well.

I haven't been able to reproduce it myself and beside that it was only reported on Galaxy A71 devices.

Again, thanks for your help!

roshanrajaratnam commented 2 years ago

@FluffyBunniesTasteTheBest I think I have found where the ConcurrentModificationException is being thrown during the import and also when removing an item from the list of networkExclusions. Could you give this a spin please?

Edit: I managed to break something with f276d09 please ignore that for now.

daretobeorjan commented 2 years ago

Hi, watching this issue since we have similar problems.

Doesn't the new getNetworkExclusions also need to make a copy of the ArrayList(s) that is the value of the map as well?

roshanrajaratnam commented 2 years ago

Hi yes I am working on this at the moment. I would say that returning an unmodifiable list should work when inserting to db. What is your opinion on this?

daretobeorjan commented 2 years ago

Yeah, I agree. I did a quick test with a re-implementation of getNetworkExclusions as such:

        return networkExclusions.entrySet().stream()
                .collect(Collectors.toMap(Map.Entry::getKey, e -> new ArrayList(e.getValue())));

It seemed to get rid of the concurrent modification crashes completely. Probably shouldn't use streams, but I mostly work with Kotlin so it's the quickest way to prototype something for me.

roshanrajaratnam commented 2 years ago

Good point. So it could be that the arraylist within the map is not an unmodifiableList causing this crash. Unfortunately cannot use streams on my end due to API limitations. I will push a fix soon, if you can please give it a test because I am having a hard time reproducing this.

roshanrajaratnam commented 2 years ago

@R0m4in-dooz I think I may have found the issue relating to the ConcurrentModificationException when sending a ConfigNodeReset According my logs it seems like a timing issue following update method is called for , currently the following update is called when sending a message and I notice that for larger networks the db update is not completed even until the ConfigNodeResetStatus is received causing this ConcurrentModificationException to be thrown when calling network.getNodes(). The only way around it seemed to be @Mavericksb solution of copying the list in to a new list because creating an unmodifiable list seems to not work in this case due to timing in my opinion.

https://github.com/NordicSemiconductor/Android-nRF-Mesh-Library/blob/cb127405c5acc859ea39e360bf195944afe587ec/mesh/src/main/java/no/nordicsemi/android/mesh/MeshNetworkDb.java#L211-L231

In addition to this, I've also made the getNetworkExclusions() return an unmodifiable map containing an unmodifiable list of addresses for a given IV Index.

@R0m4in-dooz would really appreciate if you could test this. It seemed to work on my end with your network JSON to which I provisioned new nodes.

ghost commented 2 years ago

Hello @roshanrajaratnam ! Ok i may have time to test it this week. I should pull latest dev HEAD I suppose ?

roshanrajaratnam commented 2 years ago

Great, yes pull the latest Dev branch and you should get some other fix relating to an invalid 16-bit validation.

FluffyBunniesTasteTheBest commented 2 years ago

@roshanrajaratnam please excuse the delay, it's a little busy right now.

The ConcurrentModificationException unfortunately still happens with version 3.1.8:

E/AndroidRuntime( 7576): java.util.ConcurrentModificationException
E/AndroidRuntime( 7576):    at java.util.ArrayList$Itr.next(ArrayList.java:860)
E/AndroidRuntime( 7576):    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1089)
E/AndroidRuntime( 7576):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:96)
E/AndroidRuntime( 7576):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
E/AndroidRuntime( 7576):    at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
E/AndroidRuntime( 7576):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:208)
E/AndroidRuntime( 7576):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:145)
E/AndroidRuntime( 7576):    at com.google.gson.Gson.toJson(Gson.java:735)
E/AndroidRuntime( 7576):    at com.google.gson.Gson.toJson(Gson.java:714)
E/AndroidRuntime( 7576):    at com.google.gson.Gson.toJson(Gson.java:669)
E/AndroidRuntime( 7576):    at com.google.gson.Gson.toJson(Gson.java:649)
E/AndroidRuntime( 7576):    at no.nordicsemi.android.mesh.MeshTypeConverters.networkExclusionsToJson(MeshTypeConverters.java:157)
E/AndroidRuntime( 7576):    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$6(MeshNetworkDb.java:223)
E/AndroidRuntime( 7576):    at no.nordicsemi.android.mesh.MeshNetworkDb$$ExternalSyntheticLambda1.run(Unknown Source:16)
E/AndroidRuntime( 7576):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
E/AndroidRuntime( 7576):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
E/AndroidRuntime( 7576):    at java.lang.Thread.run(Thread.java:919)

It's quite easy to provoke the crash using a test routine that:

  1. Scans for unprovisioned devices and takes the first device found.
  2. Provisions the device.
  3. Sends a ConfigNodeReset message to unprovision the node.

I'm using a one second delay between each of the steps and it repeatedly crashes after 30-50 iterations.

roshanrajaratnam commented 2 years ago

@FluffyBunniesTasteTheBest sorry it's been a tricky one to fix here, I see that this is happening when updating the db. I have included a fix that copies the network exclusions to a new map before updating the db. Please give it a shot and let me know how it goes. Also I did some fixes for #490 so make sure to pull the latest dev branch.

FluffyBunniesTasteTheBest commented 2 years ago

@roshanrajaratnam Thanks for the quick fix!

It looks promising so far: 120 iterations without any issues. Argh... Murphy's Law! Crash just while writing these lines...

E/AndroidRuntime(10318): java.util.ConcurrentModificationException
E/AndroidRuntime(10318):    at java.util.ArrayList$Itr.next(ArrayList.java:860)
E/AndroidRuntime(10318):    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1089)
E/AndroidRuntime(10318):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:96)
E/AndroidRuntime(10318):    at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
E/AndroidRuntime(10318):    at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
E/AndroidRuntime(10318):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:208)
E/AndroidRuntime(10318):    at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.write(MapTypeAdapterFactory.java:145)
E/AndroidRuntime(10318):    at com.google.gson.Gson.toJson(Gson.java:704)
E/AndroidRuntime(10318):    at com.google.gson.Gson.toJson(Gson.java:683)
E/AndroidRuntime(10318):    at com.google.gson.Gson.toJson(Gson.java:638)
E/AndroidRuntime(10318):    at com.google.gson.Gson.toJson(Gson.java:618)
E/AndroidRuntime(10318):    at no.nordicsemi.android.mesh.MeshTypeConverters.networkExclusionsToJson(MeshTypeConverters.java:157)
E/AndroidRuntime(10318):    at no.nordicsemi.android.mesh.MeshNetworkDb.lambda$update$6(MeshNetworkDb.java:221)
E/AndroidRuntime(10318):    at no.nordicsemi.android.mesh.MeshNetworkDb$$ExternalSyntheticLambda1.run(Unknown Source:16)
E/AndroidRuntime(10318):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
E/AndroidRuntime(10318):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
E/AndroidRuntime(10318):    at java.lang.Thread.run(Thread.java:919)

Happened with dev branch at a0774c137c4a5b76ed6066b53d40f611394d2f26.

Edit: One detail that could be important when trying to reproduce this: The mesh network contains four other nodes, which are plugged off while running the infinite-provision-test-loop.

Edit 2: This one can be quite tricky to trigger. Most of the time, if it happens, it happens between the 20th and 50th iteration - rarely 100, but once it took even 787 iterations.