Closed k-bx closed 8 months ago
@AsamK happy to DM you the logs which clearly show what the program is busy with. Would you take a look at them? What's the best way to do this? 🙏
For more details (without the need to DM any logs), those 60+ seconds (yes, still growing), here's the command I ran:
sudo signal-cli-native --config <CONFIG> --verbose -a '<PHONE>' --output json receive --max-messages=1 &> signal-cli-native-receive-$(date -u +%Y%m%d-%H%M).txt
And here's what the output looks like for those 60 secs in a loop:
2024-02-25T11:28:32.586Z [main] INFO LibSignal - [libsignal]: rust/protocol/src/sealed_sender.rs:441: deserialized UnidentifiedSenderMessageContent from <ID> with type PreKey
2024-02-25T11:28:32.586Z [main] ERROR LibSignal - [libsignal]: rust/protocol/src/session_cipher.rs:220: Message from <ID> failed to decrypt; sender ratchet public key <ID> message counter 499
No current session
@AsamK ok, so this helped greatly, receive time is now back to <10s:
public void receiveMessages(
Duration timeout, boolean returnOnTimeout, Integer maxMessages, Manager.ReceiveMessageHandler handler
) throws IOException {
// needsToRetryFailedMessages = true;
needsToRetryFailedMessages = false;
What are the exact implication of this? What's the proper way to handle the error pile-up I am experiencing?
Would really appreciate help!
If retry is necessary, would you agree to reduce the envelope.getServerDeliveredTimestamp() > 1000L * 60 * 60 * 24 * 30
interval to 24 hours? That should help too. Should it be customizeable or is it ok to just put 24h?
The messages that get retried there are messages, that were received from a recipient after their identity key (safety numbers) changed. Messages that fail to be processed for any other reason than changed identity key, should be automatically deleted, maybe something is going wrong there. I'd rather not decrease the interval, as it might cause message loss for some users.
@AsamK thank you!
I have three servers with signal-cli-native receive loop (in signal-cli-rest-api wrapper). Two servers are ok and relatively snappy. Third one was ok previously. However, it started getting this behavior:
The 30+ seconds are becoming 40+ seconds and even 50+ now. Before it gets too big, I wonder if there's any way I can identify the root cause?
Upgrading to latest master didn't help: