delight-im / Android-DDP

[UNMAINTAINED] Meteor's Distributed Data Protocol (DDP) for clients on Android
Apache License 2.0
274 stars 54 forks source link

ConcurrentModificationException #125

Open najeebullahshah opened 7 years ago

najeebullahshah commented 7 years ago

Crashlytics has been frequently reporting the following exception in my app. I am unable to reproduce it. Any ideas of why it is happening?

java.util.LinkedList$LinkIterator.next (LinkedList.java:124) im.delight.android.ddp.CallbackProxy.onDisconnect (CallbackProxy.java:67) im.delight.android.ddp.Meteor$1.onClose (Meteor.java:166) com.firebase.tubesock.WebSocket.closeSocket (WebSocket.java:230) com.firebase.tubesock.WebSocket.onCloseOpReceived (WebSocket.java:212) com.firebase.tubesock.WebSocketReceiver.run (WebSocketReceiver.java:61) com.firebase.tubesock.WebSocket.runReader (WebSocket.java:372) com.firebase.tubesock.WebSocket.access$000 (WebSocket.java:30) com.firebase.tubesock.WebSocket$2.run (WebSocket.java:108) java.lang.Thread.run (Thread.java:818)

kschingiz commented 7 years ago

See about it here: https://github.com/delight-im/Android-DDP/issues/117

ocram commented 7 years ago

@kschingiz Thanks for helping out! I don't think that these two issues are related, however.

@najeebullahshah Your issue seems to be that you're calling either Meteor#addCallback(MeteorCallback), Meteor#removeCallback(MeteorCallback) or Meteor#removeCallbacks() while the connection is being closed.

Since that has not been reported more frequently, can you please check your code for some mistakes, first? Please pay special attention to the handlers where you disconnect manually.

Otherwise, perhaps we'll have to consider fixing synchronization here if this happens due to accesses from different threads.

najeebullahshah commented 7 years ago

@ocram thanks. I am checking for the suggested solution throughout the app. I am not sure but is addCallback related to connection being closed? Because what i am understanding is that we should add the call back once we get meteor connection in onConnected but how can we get onConnected when we didn't yet used addCallback

ocram commented 7 years ago

You're right, you have to call Meteor#addCallback(MeteorCallback) before calling Meteor#connect(), just as in the examples shown in the README. This way, you will be informed about the onConnected event in your callback.

ocram commented 7 years ago

@najeebullahshah Can you please try replacing your dependency with

compile 'com.github.delight-im:Android-DDP:b5dcbc0ffaaf4b40ae70da70d5b58138c07c16b5'

as a potential fix? Would love to know whether this fixes the problem or whether you can still re-produce the bug. Thanks!

This adds some synchronization using synchronized (...) { ... } blocks. I don't think calling Collections#synchronizedList(java.util.List) is necessary here.

najeebullahshah commented 7 years ago

@ocram thanks alot man, much appreciated.

ocram commented 7 years ago

Please report whether this version fixes the bug for you. This is really important. If the bug is fixed, we can merge this into the main branch for all users. If the bug is not fixed, we must investigate further.

najeebullahshah commented 7 years ago

I have updated the gradle version with your one. The mentioned exception happens randomly. I am not aware of any special scenario in which the exception happens. Let me publish the new apk and in two days i'll let you know whether i receive any crash report or not

najeebullahshah commented 7 years ago

It is still happening:

java.util.LinkedList$LinkIterator.next (LinkedList.java:124) im.delight.android.ddp.CallbackProxy.onDataRemoved (CallbackProxy.java:127) im.delight.android.ddp.Meteor.handleMessage (Meteor.java:546) im.delight.android.ddp.Meteor.access$400 (Meteor.java:41) im.delight.android.ddp.Meteor$1.onMessage (Meteor.java:176) com.firebase.tubesock.WebSocketReceiver.appendBytes (WebSocketReceiver.java:113) com.firebase.tubesock.WebSocketReceiver.run (WebSocketReceiver.java:69) com.firebase.tubesock.WebSocket.runReader (WebSocket.java:372) com.firebase.tubesock.WebSocket.access$000 (WebSocket.java:30) com.firebase.tubesock.WebSocket$2.run (WebSocket.java:108)

also onException:

java.util.LinkedList$LinkIterator.next (LinkedList.java:124) im.delight.android.ddp.CallbackProxy.onException (CallbackProxy.java:147) im.delight.android.ddp.Meteor$1.onError (Meteor.java:186) com.firebase.tubesock.WebSocket.send (WebSocket.java:167) com.firebase.tubesock.WebSocket.send (WebSocket.java:149) im.delight.android.ddp.Meteor.send (Meteor.java:324) im.delight.android.ddp.Meteor.send (Meteor.java:303) im.delight.android.ddp.Meteor.initConnection (Meteor.java:267) im.delight.android.ddp.Meteor.access$300 (Meteor.java:41) im.delight.android.ddp.Meteor$1.onOpen (Meteor.java:145) com.firebase.tubesock.WebSocket.runReader (WebSocket.java:371) com.firebase.tubesock.WebSocket.access$000 (WebSocket.java:30) com.firebase.tubesock.WebSocket$2.run (WebSocket.java:108) java.lang.Thread.run (Thread.java:856)

ocram commented 7 years ago

Thanks!

From the stack trace, it becomes obvious that either

Please check again!

By the way, it's hard to resolve this if we have no way to re-produce this consistently. But perhaps it'll work.

najeebullahshah commented 7 years ago

I've verified the dependency used is the following one:

compile 'com.github.delight-im:Android-DDP:b5dcbc0ffaaf4b40ae70da70d5b58138c07c16b5'

and also verified that the crash is reported with versionCode which is build with the above dependency.

I am trying to reproduce and know the exact scenario why it happens, let see if testing it leads to anything.

ocram commented 7 years ago

That's strange, because the lines from your stack trace don't match the lines of the updated code where the error may appear. But having a reliable way to re-produce this would be great, perhaps you'll find one. Thanks!

najeebullahshah commented 7 years ago

I have changed and then synced gradle. Do you still think that there is a chance it won't be updated?

ocram commented 7 years ago

No, that sounds good, actually. This is why a reproducible test case would be great to have.

Don't know what else to do right now. Sorry!

najeebullahshah commented 7 years ago

@ocram I have replace

private final List mCallbacks = new LinkedList();

with

private final Queue mCallbacks = new ConcurrentLinkedQueue();

and for the past 4 days i haven't received any crash.

ocram commented 7 years ago

@najeebullahshah Thanks a lot for trying that!

I would have loved to know whether simply changing

private final List mCallbacks = new LinkedList();

to

private final List mCallbacks = Collections.synchronizedList(new LinkedList());

would have been sufficient as well.

But now that you (apparently) have a solution, you're probably not interested in trying other possible solutions, right?

Anyway, do you know (or is there a chance you could find out) whether ConcurrentLinkedQueue alone is sufficient or whether the synchronized statements added in https://github.com/delight-im/Android-DDP/commit/b5dcbc0ffaaf4b40ae70da70d5b58138c07c16b5 are necessary as well?

The way to find out is basing your change either on the master branch or on the issue-125 branch. What did you do? Any chance you could base your change on the master branch if you didn't do so yet?

I assume that the changes in the issue-125 branch are not necessary and your solution alone on top of master would be sufficient. So that's what I'd like to implement as a solution here (if it works).

najeebullahshah commented 7 years ago

@ocram I have added the library as a reference into the project instead of gradle. Do you want me to update "CallbackProxy" file with your changes and then check if the issue is occuring or not?

ocram commented 7 years ago

@najeebullahshah Yes, that's exactly the correct process, and it would be awesome if you could do that for us! Thanks a lot in advance!

It seems you updated CallbackProxy with ConcurrentLinkedQueue already. First, I'd just like to ask you to base this change on the current master branch, not on the issue-125 branch. If that worked, that would give us a fix that we can merge.

By the way, can you first check what you did so far, please? What branch did you base your fix on? Do you have synchronized statements in your updated CallbackProxy class?

On the other hand, it may be worth trying

private final List mCallbacks = Collections.synchronizedList(new LinkedList());

instead. But this is just an alternative solution.

najeebullahshah commented 7 years ago

I have used the following library as a reference:

https://github.com/delight-im/Android-DDP/tree/master/Source/library

I am updating the CallBackProxy with the file you updated in b5dcbc0 . Today i received report for onDataAdded and onDataRemoved for the fix that i integrated. Let's hope your solution fixes this problem :)

ocram commented 7 years ago

Yes, the following version of the library is the one that needs the tests:

https://github.com/delight-im/Android-DDP/tree/master/Source/library

You said that you were updating CallbackProxy with the changes from https://github.com/delight-im/Android-DDP/commit/b5dcbc0ffaaf4b40ae70da70d5b58138c07c16b5. This is actually what's not optimal.

It would be better to have a test without those changes. That means the library version mentioned above needs to be tested with only the potential fix that you found, which is the introduction of ConcurrentLinkedQueue instead of LinkedList. That would be perfect!

You know, if we knew that the library version mentioned above worked without any issues when your fix of one or two lines is applied, we could merge that simple fix back into the library and the issue would be resolved for everybody having problems with that :)

najeebullahshah commented 7 years ago

Actually the solution that i suggested has no crashes for a week or so but yesterday i received two reports so it will be better to try your solution and see if it solves the issue or not.

ocram commented 7 years ago

Oh, that's bad news :(

Hope the updated fix is more useful then. But again, it might seem fine for a week with errors starting to occur way later.

By the way, you've made sure, as always, that the reports are for the version with the fix? Same lines in the stack trace?

Thanks for your support!