Closed holmesworcester closed 2 years ago
the solution should include a nectar/waggle regression test for this, I think.
Here's an example of what I'm seeing. Note that it's from an account I've already seen messages from, which invalidates the hypothesis that missing messages are only due to slowness syncing the user table.
Also, it's the older instance that is missing the messages. So there's some other issue here.
Ideas:
My memory is that we've been seeing this issue for a while.
It seems not to be a problem with sagas' logic for verification/filtering out messages https://github.com/ZbayApp/monorepo/pull/395
This happened again in Quiet alpha 5. It happened after my Mac version was reconnecting to the network after being asleep for a while. It synced some but not all new messages.
Have those messages never came or came but with a big lag?
Did you manage to find the repeatable way to see this problem? You wrote that it happened after computer was asleep for a while - does it always happen this way?
I don't have the machine where the issue happened, so I can't say. You could possibly check to confirm this by using the files I sent you for the data directories.
I didn't manage to find steps to reproduce it.
Edit: I only could make it work on windows (aka "second machine"). I am not sure if that's because of the OS or the fact that I unplugged the ethernet cable but I couldn't make it work the other way (Linux being disconnected, Windows sending messages).
I managed to repeat the similar problem but only by disconnecting one of the Quiet apps from the network without closing it. Those are the steps I took:
After quiet3 and quiet2 reconnected they were able so send and receive new messages but no replication of the past messages happened.
New discovery:
Missing message "I received a message but Windows did not start replicating missing messages. Will it trigger now?"
The logs show that orbitdb did receive this entry but it didn't trigger replicate.progress
event and that's why we are missing it in our app.
This can be a different case than the one I described in the comment above because It was triggered by reopening the quiet1 again at some point of testing.
Attaching all logs from app with the broken state and part of the logs from the app with a proper state: app1MissingMessage.log app2AllMessages.log app1MissingMessagesFinalSnapshot.log
The state didn't heal on message sending nor receiving, it also didn't heal on restarting the apps. It makes sense since the entry is already saved in the local orbitdb store. However it's good news because in this case we just have to implement a mechanism that makes sure that we gathered all needed entries.
https://github.com/orbitdb/orbit-db-store/issues/122 Created issue in orbit-db-store for the case described above ^
We decided to close this task because we already have a workaround on our side so it should not affect user experience anymore. Orbitdb guys don't know what can be a cause of this but they will be working on the replicator rewrite anyway so the best thing right now is to move on and wait for their refactoring.
Right now some messages never display and we don't know why. It happens with users we've seen before, not just new users.
I posted some logs to slack showing a case where this happens.
https://zbay.slack.com/files/UTAQELTJ8/F038LS34KCM/archive.zip