High CPU usage and long receive pause - Githubissues

AsamK / signal-cli

signal-cli provides an unofficial commandline, JSON-RPC and dbus interface for the Signal messenger.

GNU General Public License v3.0

3.22k stars 306 forks source link

High CPU usage and long receive pause #1477

Closed k-bx closed 8 months ago

k-bx commented 8 months ago

I have three servers with signal-cli-native receive loop (in signal-cli-rest-api wrapper). Two servers are ok and relatively snappy. Third one was ok previously. However, it started getting this behavior:

upon receive, signal-cli-native does 100% CPU for 30+ seconds
after that it does reply
memory grows from ~100MB up to 1600MB, drops immediately upon receiving

The 30+ seconds are becoming 40+ seconds and even 50+ now. Before it gets too big, I wonder if there's any way I can identify the root cause?

Upgrading to latest master didn't help:

ubuntu@echo-dn:~$ /usr/bin/signal-cli-native --version
signal-cli 0.13.1-SNAPSHOT

k-bx commented 8 months ago

@AsamK happy to DM you the logs which clearly show what the program is busy with. Would you take a look at them? What's the best way to do this? 🙏

k-bx commented 8 months ago

For more details (without the need to DM any logs), those 60+ seconds (yes, still growing), here's the command I ran:

sudo signal-cli-native --config <CONFIG> --verbose -a '<PHONE>' --output json receive --max-messages=1 &> signal-cli-native-receive-$(date -u +%Y%m%d-%H%M).txt

And here's what the output looks like for those 60 secs in a loop:

2024-02-25T11:28:32.586Z [main] INFO  LibSignal - [libsignal]: rust/protocol/src/sealed_sender.rs:441: deserialized UnidentifiedSenderMessageContent from <ID> with type PreKey
2024-02-25T11:28:32.586Z [main] ERROR LibSignal - [libsignal]: rust/protocol/src/session_cipher.rs:220: Message from <ID> failed to decrypt; sender ratchet public key <ID> message counter 499
No current session

k-bx commented 8 months ago

@AsamK ok, so this helped greatly, receive time is now back to <10s:

    public void receiveMessages(
            Duration timeout, boolean returnOnTimeout, Integer maxMessages, Manager.ReceiveMessageHandler handler
    ) throws IOException {
        // needsToRetryFailedMessages = true;
        needsToRetryFailedMessages = false;

What are the exact implication of this? What's the proper way to handle the error pile-up I am experiencing?

Would really appreciate help!

k-bx commented 8 months ago

If retry is necessary, would you agree to reduce the envelope.getServerDeliveredTimestamp() > 1000L * 60 * 60 * 24 * 30 interval to 24 hours? That should help too. Should it be customizeable or is it ok to just put 24h?

AsamK commented 8 months ago

The messages that get retried there are messages, that were received from a recipient after their identity key (safety numbers) changed. Messages that fail to be processed for any other reason than changed identity key, should be automatically deleted, maybe something is going wrong there. I'd rather not decrease the interval, as it might cause message loss for some users.

k-bx commented 8 months ago

@AsamK thank you!