Open Cocam123 opened 5 months ago
Did you update along with restore? Maybe related to: https://github.com/element-hq/synapse/issues/17129
Nope. It was up to date before the restore (I always try to keep it up to date. Security is one of my priority)
But yea, if there is a way to fix it quickly, I won't say no. The ressources are so saturated, it's impossible to use it (the server is supposed to be online)
I updated Synapse (now I'm in 1.106). The problem isn't fixed
@Cocam123 try rolling back to v1.104.0 or whatever version you were using before. For me database performance issues started with v1.105.0.
how do I do that?
I installed Synapse with apt install matrix-synapse-py3
okay I found the package and seems like it doesn't work
Did you possibly restore the backup multiple times, e.g. partially restore it once, run into an error or interrupt it manually, then restore it again without clearing the database that you (partially) restored to the first time?
edit: I should provide the context, I say this because this is known to cause problems as it can lead to duplicate rows. https://github.com/matrix-org/synapse/issues/11779 was a previous example though looks like that particular case got patched.
Another suggestion is to ANALYZE
your database in Postgres, in case it disrupted the statistics and is leading to poor query plans being generated. I am not sure if that's a realistic problem, but it came to mind.
I completely reinstalled everything (after a reset). The server is working but it's extremely slow. The CPU is overloaded and it takes the storage (when I restart synapse, it frees up storage)
If you definitely restored your database from fresh in one attempt then that clears the first point at least :)
Did you try to ANALYZE
the tables in your database in the end? I still can't really see what else would have changed by doing a database restore, other than maybe coming online after some downtime is causing trouble.
I know it's a faff but using the Prometheus metrics + Grafana dashboard can help to have a little bit more idea of where the time is going.
As mentioned earlier, there is a suspected (or is it safe to say 'known'?) performance regression in this version (https://github.com/element-hq/synapse/issues/17129), but if you were already running that version before then I don't see why that would be the problem.
hey! okay I installed Prometheus and Synapse. What information might be of interest? I see federation but I do not know what to send (that could cause the problem except that)
Okay so I decided to upload everything I have now. The server doesn't reply atm (at least, not correctly)
I've also run a vacuum analysis on the entire database, but it doesn't give anything
it found deadrows but despite the analysis and a vacuum full verbose it didn't improve the server's situation
Thanks for your graphs!
I notice that 'Age of oldest event in staging area' is high (2+ days) and that you have ~500 events in the staging area and this number seems to not be decreasing.
This probably means that your server is struggling to persist the events.
Since the CPU usage doesn't look very high, I guess something is going slowly in the database? (Is Postgres the source of your high CPU use on your server?)
If you open up the 'Database' section in the Grafana dashboard, that probably has some interesting info, if you don't mind?
Description
Since I restored a backup of my database, I've been unable to connect to synapse matrix. It simply takes up all the RAM and CPU available on the server.
I've talked about it on the matrix channel, but we haven't managed to solve the problem.
I tried REINDEX and VACUUM FULL. I also disabled presence and changed some federation parameters in homeserver.yalm : federation: destination_min_retry_interval: 1m destination_retry_multiplier: 5 destination_max_retry_interval: 365d
but after reboot, the same problems occur
Steps to reproduce
Homeserver
matrix.cocamserverguild.com
Synapse Version
1.105.1
Installation Method
Debian packages from packages.matrix.org
Database
PostgreSQL (single one, restored from a backup)
Workers
Single process
Platform
It's running on a Debian machine on a VPS.
2 CPU, 4 Go RAM
Configuration
No response
Relevant log output
Anything else that would be useful to know?
No response