cyrusimap / cyrus-imapd

Cyrus IMAP is an email, contacts and calendar server
http://cyrusimap.org
Other
543 stars 149 forks source link

backups of shared mailboxes -- finer granularity #2915

Open elliefm opened 4 years ago

elliefm commented 4 years ago

Instead of backing up all shared mailbox as a single '%SHARED' backup, bundle them up at a deeper level (e.g. each top-level shared mailbox is its own "userid" with its own backup). Perhaps have a config option for the depth this should occur at, which would default at 0 for "all shared mailboxes in a single backup" like the current behaviour, 1 would be the behaviour described above, and larger numbers would produce more, finer grained backup files. You would choose a value that brings your worst case in line with normal behaviour.

This could help with cases like Deborah's where the single %SHARED backup is many times larger than a typical user backup, which makes tuning the backup configuration clumsy because it can't be tuned well for both cases.

I think this should be safe to enable on an existing system -- you'd end up with a bunch of new %SHARED.foo backups in addition to the existing %SHARED backup, and could then delete the old %SHARED backup once you're sure you don't need it anymore.

elliefm commented 4 years ago

Need to consider whether mailboxes like "#calendars" and "#addressbooks" are suitable split points, or should always be treated as part of their parent; but also how to handle them if we want one-backup-per-top-level-shared but some of the top-level-shareds are like "#calendars", "#addressbooks", etc!

elliefm commented 4 years ago

WIP is here, just the trivial (0 or 1) implementation so far: https://github.com/elliefm/cyrus-imapd/tree/v31/2915-shared-mailbox-backup-granularity

I'm working against master, but it should backport trivially onto 3.0 since the backups code is the same(?) on both.

elliefm commented 4 years ago

Tripped over #2920 while testing this, so that's fixed now

elliefm commented 4 years ago

Added a commit that takes a first stab at supporting values >1, but don't have even a basic "this should work" test for it yet. I feel like there are also some pathological edge cases I need to poke it with, but I don't quite see the shape of them yet.

futzle commented 3 years ago

I (finally) spent some time with this patch, applied on 3.2.5, and can make the following observations with backup_shared_mailbox_granularity = 1

Edit: I'm going to try to amend the code to:

elliefm commented 3 years ago

I just found and fixed(?) a memory leak while I was in there thinking about this. The branch doesn't build for me at the moment... I'll probably need to rebase it onto current master. Will that get in your way?

I guess it hadn't really occurred to me that shared mailboxes could have domains. I haven't seen them used much before!

I'm guessing that the mailbox name is created the first time there is a transfer and not freed/recreated when the next mailbox is transferred, even if it starts with a different string.

Well, cmd_apply_reserve in backup/backupd.c assumes there's only one shared mailbox backup, so even though it calculates the first shared mailbox backup correctly, it only does it for the first one. This won't be the cause of the XBACKUP problem though, because XBACKUP sends one mailbox at a time.... But backupd_open_backup (same file) thinks every backup has a userid and tries to use that as a key to track which backups it has open, which means it ends up tracking the first shared backup opened as if it were the only shared backup (because they all have a NULL userid).... So there's the bug, but I guess it's not so much a "bug" as it is "I didn't finish implementing it", doh!

If you're still enthusiastic to try patching this I won't get in your way, but if you want to handball it back to me, that's pretty reasonable!

futzle commented 3 years ago

Think I might have to handball this back to you. It's not a problem to use a different branch; the backup server I'm using has no other purpose.