SovereignCloudStack / issues

This repository is used for issues that are cross-repository or not bound to a specific repository.
https://github.com/orgs/SovereignCloudStack/projects/6
2 stars 1 forks source link

[Feature Request] Controlled mon backups #529

Open poelzi opened 5 months ago

poelzi commented 5 months ago

Verified mon backups are required

Since new OSD deployments should use full disk encryption, restoration of the decryption keys becomes very important aspect in cluster recovery. Since the built-in ceph encryption technique uses a secret that is stored in the mon rocksdb database, recovering corrupt mon database becomes impossible in certain scenarios.

Troubleshooting Monitors suggest to restore the last osdmap/...map from scanning OSD drives and merging their output. Since OSD can not decrypted without mon, this restoration process is not reliable.

Also not all maps are recovered correctly, causing services not not work properly - pool permissions for example.

The proper way would be, to use rocksdb BackupEngine to create verified mon backups that can be restored via command line start argument.

The backup should be triggered regularly and whenever a new decryption key is stored in the mon database.

Upstream tracking

Ceph Issue #63801 asked for feedback regarding the implementation but no feedback was given.

I started to Implement the functionality, but tests and trigger through admin socket are still missing.

artificial-intelligence commented 4 months ago

Ceph Issue #63801 asked for feedback regarding the implementation but no feedback was given.

regarding this: It might be worth to ask in #ceph IRC channel on irc.oftc.net and on the ceph-devel or user mailing list, the ceph issue tracker is not that well equipped with answering such questions.

artificial-intelligence commented 4 months ago

Maybe I'm out of the loop, so forgive me my maybe silly question, but:

Since new OSD deployments should use full disk encryption[..]

Do we actually mandate that? Is that a new upstream default I'm not aware of? Afaik many deployments run without full disk encryption?

Regardless of the need for full disk encryption: Could you clarify if I still need a backup of the mon DB when my OSD is not encrypted at all?

Just so I can better understand this feature.

Also: Is there any public repository where one could look at some code already? Or if not, at least some design document or something? An architecture diagram maybe?

Thanks

poelzi commented 4 months ago

It is highly encouraged to backup the mon database otherwise you can not recover all information present in the mon. osdmaps and some other maps are stored in the osd cluster, but auth information not. So, authentication permissions set by the user are lost. This is the procedure you can try, but from my experience, it is a pain to do so and the cluster is not the same: https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#monitor-store-failures

poelzi commented 3 months ago

I opened the main PR : https://github.com/ceph/ceph/pull/56772 and https://github.com/ceph/ceph/pull/56751 which was a side product