element-hq / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://element-hq.github.io/synapse
GNU Affero General Public License v3.0
1.61k stars 203 forks source link

[Presence] Huge spike in CPU usage/Federation traffic approximately every 25 minutes #15878

Open matrixbot opened 11 months ago

matrixbot commented 11 months ago

This issue has been migrated from #15878.


There is a timeout for when to send a ping over federation every 25 minutes that keeps a user from being marked 'offline' before the 30 minute timeout hits.

Federation spike 1

This appears to be the replication notifier system ramping up and queueing a bunch of federation sending requests over approximately 1 minute worth of time(give or take a few seconds)

Images ![Federation spike 2](https://github.com/matrix-org/synapse/assets/1582365/6042747f-3a96-439c-9123-36ad500b1006) ![Federation spike 3](https://github.com/matrix-org/synapse/assets/1582365/260d263f-0bd0-419a-88dc-ea6ae52fa1df) ![Federation spike 4](https://github.com/matrix-org/synapse/assets/1582365/7ece6833-c9d3-4f26-90b9-78a08b483d3a)

There is a database hit during this to get_current_hosts_in_room(), I'm not personally convinced it's contributing to the seriousness of this situation(but included here for completeness).

Images: ![Federation spike 5](https://github.com/matrix-org/synapse/assets/1582365/2cdee10a-590b-49a9-a7a8-bab73a48f439)

UPDATE: Additional information from the other side of the slash in the title

The large spike in traffic caused by queueing and then sending all those requests looks like this: Federation spike 7

realtyem commented 11 months ago

This can be closed as it is the nature of how presence updates are sent. To change this would require a spec change to use a meshing network of communication, which would probably be a privacy violation(never mind how brittle it could end up). Someone else's homeserver shouldn't be handling/forwarding my presence for me.