TokTok / c-toxcore

The future of online communications.
https://tox.chat
GNU General Public License v3.0
2.29k stars 287 forks source link

High CPU and network usage with 1000+ friends #2178

Open alexbakker opened 2 years ago

alexbakker commented 2 years ago

I maintain EchoBot, a small service for Tox that allows users to test their audio and video. It runs version 0.2.17 of c-toxcore. We've recently started seeing very large amounts of CPU and network usage. 100% CPU usage on the Tox thread and around 28 Mbit/s worth of network transmission, continuously. EchoBot has always been rough on resources when starting up, but would settle down eventually. The settling down part appears to no longer happen now that it has ~1400 friends.

I used Jfreegman's netprof branch of toxcore to do some monitoring of the types of packets that are being sent. Here's a chart of the top packet types over the course of half a day:

Toxcore sends almost ~7500 ONION_SEND_INITIAL packets per second, continuously. @JFreegman provided some patches to try and nail down why toxcore seems to never back off of sending so many onion packets. He found that in this case toxcore regularly thinks that we're no longer connected to the Tox network and then resets the announcement run counter:

https://github.com/TokTok/c-toxcore/blob/7dde71c4e9d4ff8e293b70f5c3ac08a504cc36f6/toxcore/onion_client.c#L1886-L1895

I think there are possibly two things to do here:

emdee-is commented 2 years ago

Did this get addressed?

If not, could toxcore put a delay in to not try to reconnect more than x times a second, settable at compile time or from an environment variable?

AndyTOX commented 2 years ago

Hi, how about a workaround until the issue get proper fixed ? I would propose that EchoBot "forget" a friend after an hour. This way you never get 1000+ concurrent user. The most common EchoBot use case is 1st time test and re-test after hardware/environment changes. A 1 hour "test window" would be fine in, I guess, 99% of all use cases. If not, just remove EchoBot and re-add it to get another hour. RFC.