Please compress the database dump

MayeulC commented 1 year ago

Describe the bug

My synapse database is big. Database dumps can take more than 100 GB. My latest one (single user server) is at 32.8 GB. Writing and reading that file to disk takes a long time, not to mention the wasted disk space.

Backups used to be compressed, but back then they were first tar-ed, then compressed. Both stages took a while.

Expected gains

zstd level 3 gets an old dump from 17GB to 3.8GB. Level 7 only gets this down to 3.5GB. Level 1 (minimum, fastest) reaches 4.2 GB.

Both archive creation and dump time should be faster as less data needs to be written to disk, especially for hard disks.

Sample runs

These runs were collected with some I/O in the background (synapse restoration in progress).

# time cat tmp/apps/synapse/backup/dump.sql |zstd -T0 -1 > test1.zst
cat tmp/apps/synapse/backup/dump.sql  0,20s user 20,29s system 14% cpu 2:24,87 total
zstd -T0 -1 > test1.zst  110,79s user 10,31s system 83% cpu 2:24,89 total
# time cat tmp/apps/synapse/backup/dump.sql |zstd -T0 -3 > test3.zst
cat tmp/apps/synapse/backup/dump.sql  0,25s user 15,99s system 10% cpu 2:32,51 total
zstd -T0 -3 > test3.zst  120,20s user 7,80s system 83% cpu 2:32,58 total
# time cat tmp/apps/synapse/backup/dump.sql |zstd -T0 -7 > test7.zst
cat tmp/apps/synapse/backup/dump.sql  0,17s user 16,89s system 8% cpu 3:16,85 total
zstd -T0 -7 > test7.zst  630,73s user 7,80s system 324% cpu 3:17,04 total
# time cat tmp/apps/synapse/backup/dump.sql > test0.sql
cat tmp/apps/synapse/backup/dump.sql > test0.sql  0,17s user 21,84s system 10% cpu 3:33,71 total

This suggests that level 3 is probably good enough, and compressing barely adds any time to the operation, at least on my (powerful, relatively slow disk for the backup partition) machine.

lapineige commented 1 year ago

This suggests that level 3 is probably good enough

According to my personal benchmarks, it is. Higher levels result in high speed loss compare to a minimal compression gain, lower level results in minimal speed gains for not that much improved speed. Even on a Raspberry Pi, not a powerful computer.

and compressing barely adds any time to the operation, at least on my (powerful, relatively slow disk for the backup partition) machine.

This will reduce most of the time speed up the backup, as you will write massively less data with a small CPU overhead.

Gredin67 commented 1 year ago

Is compression compatible with borg incremental backups? What do you think @zamentur ?

MayeulC commented 1 year ago

It's probably compatible (partially de-duplicable) if compressing with --rsyncable, as offered by zstd and gzip: https://github.com/facebook/zstd/issues/1155

I'm not sure about borg's deduplication algorithm, but I imagine it works similarly to rsync, chunking files based on a rolling hash. This should work with --rsyncable.

Edit: yes, it should work. borg uses a rolling hash (buzhash/cyclic polynomial), so it creates content-based chunks. If large portions of input data do not change, zstd --rsyncable will output the same compressed data, which can then be de-duplicated.

Edit 2: example use-case with borg and gzip --rsyncable: https://gitlab.archlinux.org/archlinux/infrastructure/-/merge_requests/429

lapineige commented 1 year ago

Is compression compatible with borg incremental backups?

What do you mean by "compatible" ? The file will be backed-up, either compressed or not. But if it will probably be recognized as a different file, hence not de-duplicated. The backup works, but you loose storage.

lapineige commented 1 year ago

:up: :slightly_smiling_face:

MayeulC commented 1 year ago

But if it will probably be recognized as a different file, hence not de-duplicated

Borg works per chunk (part of a file), not per file. Creating an archive with zstd --rsyncable should produce as many identical chunks as possible, so it will get deduplicated (and can be rsynced faster).

With the new synapse upgrade, I hit this again (38GB). Since the upgrade failed, I am now 3 hours in, and counting.

zamentur commented 1 year ago

Very nice suggestion (db backup compression). We should have that on each apps i guess.

zamentur commented 1 year ago

So i added a ticket for that in yunohost issue tracker cause it's a general usecase for all backup, and it could be very useful.

On my side i find an other way to deal with it in an other way: reduce the size of the db ^^ https://levans.fr/shrink-synapse-database.html

lapineige commented 1 year ago

Well imho, all the backups (parts that could be significantly compressed) should be compressed by default, or at least we should have that possibility… well, we discussed that on the forum.

But yeah, lacking that feature, compressing db export just makes a lot of sense, as it's often very easy to compress and as the storage gain can be very significant. Integrating this into the core would be a very nice improvement. (restore the compressed backup functionality too 😇)

tio-trom commented 1 year ago

Exactly. My database is 200GB so I can never even update Synapse from the YNH panel since it will take an hour or so. I do it via SSH with the option to skip the backups simply because of the database being enormous. And when Borg backs-up it takes 200GB on my server space before uploading to Borg, which is again massive. Synapse is the only YNH app that we have which makes it not possible to deal with via YNH Panel properly.

lapineige commented 1 year ago

Out of curiosity, how did it grow to such an enormous value ? Are files and media included in the DB ?

tio-trom commented 1 year ago

Out of curiosity, how did it grow to such an enormous value ? Are files and media included in the DB ?

I store the media in a separate folder that's just 14GB in size. We have around 500 users. Maybe that's why!? Is there a way to see what takes up so much?

EDIT: I see this https://matrix-org.github.io/synapse/latest/usage/administration/database_maintenance_tools.html and this https://levans.fr/shrink-synapse-database.html - I wonder how to get the access token with the YNH install. Seems a bit of work but I will try it. If anyone knows a more simple solution, like a script that does these automatically, please let me know.

Uhh

Table Name	Total	Index	Table
state_groups_state	262 GB	96 GB	166 GB
event_json	10 GB	727 MB	9794 MB
events	5928 MB	4052 MB	1876 MB
event_edges	3376 MB	2104 MB	1272 MB
device_lists_changes_in_room	2803 MB	1607 MB	1196 MB
event_auth	2773 MB	1029 MB	1744 MB
received_transactions	2636 MB	1391 MB	1245 MB
event_search	2113 MB	877 MB	1236 MB
room_memberships	1613 MB	934 MB	679 MB
event_to_state_groups	1230 MB	722 MB	508 MB
current_state_delta_stream	892 MB	116 MB	776 MB
event_auth_chain_links	850 MB	346 MB	504 MB
receipts_linearized	846 MB	785 MB	61 MB
state_events	805 MB	294 MB	511 MB
event_auth_chains	755 MB	447 MB	308 MB
device_inbox	606 MB	118 MB	488 MB
event_relations	554 MB	340 MB	214 MB
device_lists_stream	487 MB	415 MB	72 MB
current_state_events	471 MB	334 MB	137 MB
presence_stream	441 MB	418 MB	23 MB
cache_invalidation_stream_by_instance	411 MB	366 MB	45 MB
stream_ordering_to_exterm	384 MB	312 MB	72 MB
device_lists_remote_cache	279 MB	32 MB	247 MB
state_groups	264 MB	110 MB	155 MB
e2e_cross_signing_keys	201 MB	23 MB	179 MB
users_in_public_rooms	185 MB	127 MB	58 MB
user_directory_search	134 MB	76 MB	58 MB
state_group_edges	127 MB	68 MB	59 MB
receipts_graph	110 MB	55 MB	55 MB
user_directory	103 MB	61 MB	41 MB
event_push_actions	39 MB	35 MB	3744 kB
device_lists_remote_pending	37 MB	21 MB	16 MB
device_lists_outbound_pokes	33 MB	20 MB	12 MB
redactions	29 MB	17 MB	12 MB
e2e_room_keys	29 MB	4616 kB	24 MB
remote_media_cache_thumbnails	24 MB	11 MB	12 MB
users_who_share_private_rooms	22 MB	17 MB	4888 kB
server_keys_json	18 MB	2424 kB	16 MB
e2e_one_time_keys_json	14 MB	3864 kB	10 MB
destination_rooms	10088 kB	7416 kB	2672 kB
event_backward_extremities	9168 kB	6464 kB	2704 kB
device_lists_remote_extremeties	8872 kB	3496 kB	5376 kB
remote_media_cache	6832 kB	2104 kB	4728 kB
server_signature_keys	5392 kB	1648 kB	3744 kB
event_failed_pull_attempts	5336 kB	1952 kB	3384 kB
event_forward_extremities	5080 kB	4880 kB	200 kB
user_daily_visits	4784 kB	1720 kB	3064 kB
local_media_repository	4168 kB	2904 kB	1264 kB
federation_inbound_events_staging	4048 kB	3768 kB	280 kB
local_media_repository_thumbnails	3648 kB	2520 kB	1128 kB
ex_outlier_stream	3504 kB	768 kB	2736 kB
user_ips	3056 kB	2448 kB	608 kB
device_lists_outbound_last_success	2848 kB	1160 kB	1688 kB
threads	1776 kB	1008 kB	768 kB
destinations	1736 kB	568 kB	1168 kB
room_account_data	992 kB	736 kB	256 kB
device_federation_inbox	992 kB	392 kB	600 kB
user_directory_stale_remote_users	960 kB	648 kB	312 kB
event_txn_id	896 kB	792 kB	104 kB
account_data	680 kB	352 kB	328 kB
local_current_membership	640 kB	320 kB	320 kB
devices	616 kB	168 kB	448 kB
e2e_cross_signing_signatures	472 kB	136 kB	336 kB
stream_positions	472 kB	40 kB	432 kB
access_tokens	424 kB	224 kB	200 kB
event_txn_id_device_id	416 kB	312 kB	104 kB
event_auth_chain_to_calculate	416 kB	288 kB	128 kB
device_lists_remote_resync	408 kB	208 kB	200 kB
event_push_summary	376 kB	152 kB	224 kB
e2e_device_keys_json	376 kB	56 kB	320 kB
push_rules_stream	304 kB	104 kB	200 kB
event_push_actions_staging	256 kB	232 kB	24 kB
e2e_fallback_keys_json	256 kB	64 kB	192 kB
profiles	248 kB	120 kB	128 kB
room_stats_state	248 kB	64 kB	184 kB
rooms	232 kB	120 kB	112 kB
room_stats_current	224 kB	64 kB	160 kB
local_media_repository_url_cache	224 kB	152 kB	72 kB
e2e_room_keys_versions	216 kB	16 kB	200 kB
users	216 kB	88 kB	128 kB
user_filters	200 kB	88 kB	112 kB
worker_read_write_locks	192 kB	136 kB	56 kB
worker_read_write_locks_mode	184 kB	128 kB	56 kB
pushers	184 kB	64 kB	120 kB
partial_state_rooms_servers	184 kB	136 kB	48 kB
room_depth	176 kB	64 kB	112 kB
open_id_tokens	176 kB	88 kB	88 kB
device_federation_outbox	176 kB	32 kB	144 kB
user_stats_current	168 kB	56 kB	112 kB
rejections	168 kB	56 kB	112 kB
push_rules_enable	152 kB	80 kB	72 kB
worker_locks	152 kB	104 kB	48 kB
push_rules	152 kB	72 kB	80 kB
ui_auth_sessions_ips	128 kB	80 kB	48 kB
ui_auth_sessions	120 kB	56 kB	64 kB
user_threepids	112 kB	48 kB	64 kB
ui_auth_sessions_credentials	104 kB	56 kB

tio-trom commented 1 year ago

I think we need this tool to compress the Synapse table https://github.com/matrix-org/rust-synapse-compress-state - even the devs recognized this is an issue and made this tool. Any way to have it packaged for YNH?

tio-trom commented 1 year ago

Ok after days of optimization I managed this:

Relation	Total Size
public.state_groups_state	76 GB
public.event_json	10 GB
public.events	4327 MB
public.event_edges	2910 MB
public.device_lists_changes_in_room	2482 MB
public.event_auth	2064 MB
public.event_search	1778 MB
public.room_memberships	1171 MB
public.event_to_state_groups	971 MB
public.current_state_delta_stream	822 MB
public.event_auth_chain_links	761 MB
public.state_events	730 MB
public.received_transactions	674 MB
public.event_auth_chains	632 MB
public.event_relations	424 MB
public.current_state_events	297 MB
public.device_lists_remote_cache	220 MB
public.device_inbox	204 MB
public.e2e_cross_signing_keys	196 MB
public.state_groups	189 MB

Still a lot! But well 3 times less the size. What I did?

First compress using this https://github.com/matrix-org/rust-synapse-compress-state#building . Need to install that package and run it like:

nohup synapse_auto_compressor -p postgresql://matrix_synapse:db-password-here@localhost/matrix_synapse -c 500 -n 10000 >/var/log/matrix-opt.log 2>&1 &

To run in the background.

The I had to REINDEX:

PGPASSWORD="db-password-here" nohup psql -U matrix_synapse -d matrix_synapse -c "REINDEX (VERBOSE) DATABASE matrix_synapse;" >/var/log/matrix-reindex.log 2>&1 &

And then to VACUUM:

PGPASSWORD="db-password-here" nohup psql -U matrix_synapse -d matrix_synapse -c "VACUUM FULL VERBOSE ANALYZE;" >/var/log/matrix-vacuum.log 2>&1 &

The compression took around 3 days of nonstop compressing and put a huge toll on the CPU. 10 core CPU ad 100%. The rest take around 3-4 hours each. And will take a lot of space on your disc, it will duplicate your database on the server before doing the reindex and vacuuming. Then it will delete it.

I do not know if I can do more to reduce the size. But even at this size it is not easy to manage it via the YNH panel....too big to backup, too big to restore.

lapineige commented 1 year ago

Wow, thanks a lot for documenting this !

Did you stopped the synapse service meanwhile ?

tio-trom commented 1 year ago

You have to stop synapse while you do the Reindex and Vaccum full. Not when you do the compression. Vaccum full is just to restore the diskspace to the server. Basically the compression can say cut down your database from 200GB to 100GB but you wont see that on your disk because postgresql still shows 200 GB and keeps the "empty" space for the database so it will add new stuff to this empty space. idk how to explain basically your database wont grow anymore and the diskspace looks like 200 GB is full with the synapse database....

lapineige commented 1 year ago

It's clear for me, thanks :)

MayeulC commented 1 year ago

Yeah, I used a similar tool a while ago, I also used to have a DB of more than 100G, with myself as the only user 😇

I am uncertain if compression would help a lot with the backup/restore speed, it may be 30% faster but would still likely take hours. However, it would help quite a bit with disk space.

psql -U matrix_synapse

You can also do sudo -u postgres psql to avoid having to deal with authentication.

It's also possible to drop old room history from the database, for rooms not originating on one's server. The assumption is that synapse will fetch it again from distant servers if needed. But we stray from the topic :)

lapineige commented 1 year ago

it may be 30% faster but would still likely take hours. However, it would help quite a bit with disk space.

30% of a couple of hours is already a big deal !

Gredin67 commented 11 months ago

rather than compressing the database dump, why not compress the database itself ?

[ ] install https://github.com/matrix-org/rust-synapse-compress-state
[ ] Run cron tasks to either compress or cleanup old history of big rooms
[ ] Run cron jobs to clean up media and messages

https://wiki.chatons.org/doku.php/services/messagerie_instantanee/matrix#nettoyage_d_un_serveur_synapse https://jacksonchen666.com/posts/2022-12-03/14-33-00/

The other advantage would be that the synapse_ynh package could kind of "guarantee" that the messages and media retention policy is actually applied by actually cleaning, purging and freeing space in the DB

lapineige commented 11 months ago

Wouldn't that hurt performances ?

OniriCorpe commented 11 months ago

I think we need this tool to compress the Synapse table https://github.com/matrix-org/rust-synapse-compress-state - even the devs recognized this is an issue and made this tool. Any way to have it packaged for YNH?

maybe the most easiest way would be to just put the executable in the installation path at installation, and upgrade this tool at the same time of synapse if a new version is released?

Gredin67 commented 3 months ago

all people having issues with huge database should check this https://github.com/YunoHost-Apps/synapse_ynh/issues/478

YunoHost-Apps / synapse_ynh