Closed benaryorg closed 1 month ago
a backport for a commit that fixes this (supposedly, haven't tested that)
@benaryorg Could you test whether the patch fixes it?
Neither I nor any NixOS tests use ceph-volume
, so it's not so easy to test for me.
If it fixes it for you, we can merge and backport a fix quickly.
Afterwards we should also write a NixOS VM test for this, so that you're protected of this via automation!
@benaryorg For convenience of testing:
@nh2 I started doing some testing, but given how long Ceph takes to compile (don't get me started) I just shoved it into my hydra and stopped caring.
Turns out I shouldn't have picked the linked 607eb34b2c278566c386efcbf3018629cf08ccfd
from main, but instead 5df13b4197a10f0209a535a30ca9b9e5e6a12fdb
which is the patch that was backported onto the reef branch (with no release yet), so the patch didn't apply and me doing something else for an hour while hydra idled around was for naught.
I just grabbed the code from your linked PR and shoved it into my overlay (testing this on 24.05 still), and the build process is looking good (IPv6 only) for now. Considering that building Ceph on my local 24 cores took half an hour, building with that hydra will likely take about two hours or more, so while I'll have my eyes on it for a bit to make sure it runs properly, I'll probably go to bed and get back to you in ~9h when it's built (at which point I'll be able to pull the exact build from hydra onto the server, making sure that the exact patches work).
Okay, the local 24 cores were a bit faster (the hashes still match so it is the same build that my hydra is still struggling with), but now I'm getting this error which is ever so slightly different, but still hints in the same direction:
[2024-08-13 04:23:30,566][ceph_volume.process][INFO ] Running command: /run/current-system/sw/bin/cryptsetup --version
[2024-08-13 04:23:30,571][ceph_volume.process][INFO ] stdout cryptsetup 2.7.3 flags: UDEV BLKID KEYRING KERNEL_CAPI HW_OPAL
[2024-08-13 04:23:30,572][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
^^^^^^^^^^^
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/devices/lvm/main.py", line 46, in main
terminal.dispatch(self.mapper, self.argv)
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/devices/lvm/activate.py", line 283, in main
self.activate(args)
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
^^^^^^^^^^^^^^
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/devices/lvm/activate.py", line 211, in activate
activate_bluestore(lvs, args.no_systemd, getattr(args, 'no_tmpfs', False))
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/devices/lvm/activate.py", line 73, in activate_bluestore
encryption_utils.set_dmcrypt_no_workqueue()
File "/nix/store/mp66xzqrhjy86jmbzm98lrbjcxcwk6iv-ceph-18.2.4/lib/python3.11/site-packages/ceph_volume-1.0.0-py3.11.egg/ceph_volume/util/encryption.py", line 54, in set_dmcrypt_no_workqueue
raise RuntimeError('Error while checking cryptsetup version.\n',
RuntimeError: ('Error while checking cryptsetup version.\n', '`cryptsetup --version` output:\n', 'cryptsetup 2.7.3 flags: UDEV BLKID KEYRING KERNEL_CAPI HW_OPAL ')
Arghs, yes, it does need this patch additionally: https://github.com/ceph/ceph/commit/607eb34b2c278566c386efcbf3018629cf08ccfd
Otherwise it will still use .match()
instead of .search()
where the former requires the full string to match and the latter finds substrings.
I'll pull in that patch too.
Edit: as someone who is aware of computational complexity and backtracking in regular expressions, the "the regex pattern to more accurately capture version numbers" captures exactly the same version numbers but with a lot more backtracking and no word boundaries so it's literally the match/search difference that's relevant (although I appreciate the added tests). I just felt like I had to say that somewhere, because the old regex actually captured the pattern, the new one is barely more than [0-9.]+
.
diff --git a/config/host/haskell.home.bsocat.net/ceph.nix b/config/host/haskell.home.bsocat.net/ceph.nix
index 2106139..f860060 100644
--- a/config/host/haskell.home.bsocat.net/ceph.nix
+++ b/config/host/haskell.home.bsocat.net/ceph.nix
@@ -99,6 +99,43 @@
builtins.listToAttrs
];
+ nixpkgs.overlays = lib.mkAfter
+ [
+ (final: prev: let
+ new_patches = [
+ # Fixes mgr not being able to import `packaging` due to autotools >= 70.
+ # Remove once https://github.com/ceph/ceph/pull/58624 is merged, see
+ # https://github.com/NixOS/nixpkgs/pull/330226#issuecomment-2268421031
+ (final.fetchpatch {
+ url = "https://github.com/ceph/ceph/commit/8da2d857fa8fdfedd7aad0ca90e1780a3ed085c9.patch";
+ name = "ceph-mgr-python-fix-packaging-import.patch";
+ hash = "sha256-3Yl1X6UfTf0XCXJxgRnM/Js9sz8tS+hsqViY6gDExoI=";
+ })
+
+ # Fixes cryptesetup version parsing regex, see
+ # * https://github.com/NixOS/nixpkgs/issues/334227
+ # * https://www.mail-archive.com/ceph-users@ceph.io/msg26309.html
+ # * https://github.com/ceph/ceph/pull/58997
+ # Remove once we're on the next version of Ceph 18, when this should be in:
+ # https://github.com/ceph/ceph/pull/58997
+ (final.fetchpatch {
+ url = "https://github.com/ceph/ceph/commit/6ae874902b63652fa199563b6e7950cd75151304.patch";
+ name = "ceph-reef-ceph-volume-fix-set_dmcrypt_no_workqueue.patch";
+ hash = "sha256-r+7hcCz2WF/rJfgKwTatKY9unJlE8Uw3fmOyaY5jVH0=";
+ })
+ (final.fetchpatch {
+ url = "https://github.com/ceph/ceph/commit/607eb34b2c278566c386efcbf3018629cf08ccfd.patch";
+ name = "ceph-reef-ceph-volume-fix-set_dmcrypt_no_workqueue-regex.patch";
+ hash = "sha256-q28Q7OIyFoMyMBCPXGA+AdNqp+9/6J/XwD4ODjx+JXY=";
+ })
+ ];
+ in
+ {
+ ceph = prev.ceph.overrideAttrs ({ patches ? [], ... }: { patches = patches ++ new_patches; });
+ }
+ )
+ ];
+
benaryorg.prometheus.client.mocks.ceph =
{
port = 9283;
Looks like everything works with those three patches!
Thanks for testing, I merged the fixes in PR #334275!
@nh2 looking at the current master branch patches I do not see the mentioned third patch required to make this work as mentioned in both of my prior comments. Am I missing something or was the third patch omitted by mistake?
@benaryorg You're right, I missed the second patch, because the commit messages look so similar.
I'll make a PR to fix it.
Describe the bug
Ceph version 18.2.4 is currently not usable with dmcrypt volumes and has been backported to NixOS 24.05 (and subsequently may break clusters shortly).
Steps To Reproduce
Steps to reproduce the behavior:
ceph-volume lvm activate
under the hood (with a dmcrypt'd bluestore)I am experiencing this with 8f4cb508c33212aa69ae22958d03c0ba9a906f5b, however I'm pretty sure #330226 (backported via #333401) is the culprit, but I haven't confirmed this yet.
Expected behavior
The bluestore volume is activated.
Additional context
The issue was raised on the ceph-users mailinglist referring to a lack of a backport for a commit that fixes this (supposedly, haven't tested that), which has been merged only to main as far as I can see.
This is the relevant log bit when run with the current NixOS 24.05 version:
(i.e. the version is parsed including the flags which breaks)
Notify maintainers
Metadata
Add a :+1: reaction to issues you find important.