cezarmathe / btrfsutil-rs

Safe wrappers for libbtrfsutil.
https://crates.io/crates/btrfsutil
MIT License
6 stars 5 forks source link

subvolume snapshot broken? #8

Open akrantz01 opened 4 years ago

akrantz01 commented 4 years ago

For some reason, I can't take a snapshot of a volume using Subvolume::snapshot. I keep getting a SearchFalied response no matter the path I provide as the destination. Here is the code I am using to take the snapshot:

Subvolume::get("/mnt/images/alpine")
  .unwrap()
  .snapshot("/mnt/containers/test/rootfs", None, None)?;

/mnt is a the root of a btrfs filesystem with a subvolume at images/alpine. When I try it using the btrfs command line, it works with no issue. Even when I call it using std::process::Command it works properly. Am I doing something wrong, or is it within the library?

Edit: I should also add that I am running the program as root, so permissions shouldn't be an issue?

cezarmathe commented 4 years ago

I think this is most likely a library issue. Right now, after issuing the snapshot creation we try to get the subvolume at that path https://github.com/cezarmathe/btrfsutil-rs/blob/97aea3044889b3c0984a3403dcc84f47f8d87ab5/src/subvolume/subvol.rs#L248 and that may be too fast. I think this could be fixed by using an async transaction id here https://github.com/cezarmathe/btrfsutil-rs/blob/97aea3044889b3c0984a3403dcc84f47f8d87ab5/src/subvolume/subvol.rs#L243 and then waiting on it(or don't wait, but that will be in the async version of the library).

ypoluektovich commented 3 years ago

I managed to reproduce the error (though it might be a different one) and figure out the cause.

First, version 0.1.0 was failing with the error "Could not search B-tree". I don't remember what exactly the reason was, but eventually I had a thought to upgrade to the current master (003fa612bbacad1ef4c54abc6d817f0eab4ec8c2) and that error went away.

Another one appeared instead: "Could not create subvolume", with errno (which I had to extract manually using nix, by the way; it might be a good idea to expose it somehow in this library's errors) having been set to EOPNOTSUPP. This one I managed to trace all the way down to the implementation of the ioctl function used to create snapshots.

The function btrfs_util_create_snapshot from libbtrfsutil accepts a *mut u64 and is supposed to write into it the BTRFS transaction id of the transaction in which the snapshot was created. However... https://github.com/kdave/btrfs-devel/blob/92477dd1faa650e50bd3bb35a6c0b8d09198cc35/include/uapi/linux/btrfs.h#L40-L41 ... the ioctl implementation in the kernel no longer supports that! The feature has been deprecated since 5.7, as evidenced by the linked comment. And, indeed, when I tried to call the generated binding (from btrfsutil-sys 1.2.1) directly without specifying the async, it worked.

cezarmathe commented 3 years ago

First, version 0.1.0 was failing with the error "Could not search B-tree". I don't remember what exactly the reason was, but eventually I had a thought to upgrade to the current master (003fa612bbacad1ef4c54abc6d817f0eab4ec8c2) and that error went away.

AFAIK that error was caused by attempting to always use a function that required superuser permissions https://github.com/cezarmathe/btrfsutil-rs/blob/375010d9aa41db96a44eb9b56792b18d5877cbae/src/subvolume/subvol.rs#L234 which uses https://github.com/cezarmathe/btrfsutil-rs/blob/375010d9aa41db96a44eb9b56792b18d5877cbae/src/subvolume/subvol.rs#L168

I think this was fixed on master, but there isn't any published version with a fix for this.

The function btrfs_util_create_snapshot from libbtrfsutil accepts a *mut u64 and is supposed to write into it the BTRFS transaction id of the transaction in which the snapshot was created. However... https://github.com/kdave/btrfs-devel/blob/92477dd1faa650e50bd3bb35a6c0b8d09198cc35/include/uapi/linux/btrfs.h#L40-L41 ... the ioctl implementation in the kernel no longer supports that! The feature has been deprecated since 5.7, as evidenced by the linked comment. And, indeed, when I tried to call the generated binding (from btrfsutil-sys 1.2.1) directly without specifying the async, it worked.

This is (or was?) the libbtrfsutil version supported by this library: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L28-L30

Here there is no sign that async transaction IDs are not supported: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L415-L431. Are you saying that support for async transactions has been dropped?

By the way, I looked over the implementation and I might have found an issue. https://github.com/cezarmathe/btrfsutil-rs/blob/003fa612bbacad1ef4c54abc6d817f0eab4ec8c2/src/subvolume/subvol.rs#L294-L306 https://github.com/cezarmathe/btrfsutil-rs/blob/003fa612bbacad1ef4c54abc6d817f0eab4ec8c2/src/subvolume/subvol.rs#L308 The transaction is set to 0 - essentially NULL (might've been better to use std::ptr::null() tbh). After that, the library uses btrfs_util_wait_sync to wait on a NULL async transaction id.

I haven't tested this, but I feel like there's something wrong with that.

Another one appeared instead: "Could not create subvolume", with errno (which I had to extract manually using nix, by the way; it might be a good idea to expose it somehow in this library's errors) having been set to EOPNOTSUPP.

At the moment the library returns the error as reported by libbtrfsutil (enum btrfs_util_error). How do you suggest integrating the errno as well in the reported errors?

Thanks for sharing your findings!

ypoluektovich commented 3 years ago

This is (or was?) the libbtrfsutil version supported by this library: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L28-L30

Here there is no sign that async transaction IDs are not supported: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/btrfsutil.h#L415-L431. Are you saying that support for async transactions has been dropped?

libbtrfsutil is a wrapper around the actual btrfs code. To do the actual creation of the snapshot, it invokes the kernel code through ioctl: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/subvolume.c#L1174

It's that kernel code that's supposed to create the snapshot and return the transaction id by setting a field in the args struct, which is defined here: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/ioctl.h#L98. That field is also present in the parallel definition in the btrfs-devel repo: https://github.com/kdave/btrfs-devel/blob/92477dd1faa650e50bd3bb35a6c0b8d09198cc35/include/uapi/linux/btrfs.h#L130

However, if you trace the ioctl invocation:

You'll find the place where EOPNOTSUPP is returned. It seems that libbtrfsutil passes an unsupported flag — namely, the one which asks the kernel code to return the transaction. It's set here: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/subvolume.c#L1156-L1157. That flag is no longer supported though (its definition is commented out in btrfs-devel), and you won't find any usages of transid in the ioctl implementation.

The transaction is set to 0 - essentially NULL (might've been better to use std::ptr::null() tbh). After that, the library uses btrfs_util_wait_sync to wait on a NULL async transaction id.

That's not quire correct. You declare a variable and set its value to 0. Then you send the address of that variable to libbtrfsutil — and the address is not zero/nullptr. The libbtrfs function detects that, adds the flag for ioctl and, if the ioctl invocation is successful, uses the pointer to write the transaction id into your variable, thus overwriting the zero that you initialized it with: https://github.com/kdave/btrfs-progs/blob/471b4cf7e3a46222531a895f90228ea164b1b857/libbtrfsutil/subvolume.c#L1178-L1179. After that, it'd be correct to use the newly written value of the variable to wait for the transaction. So, to summarize, your usage of the feature is correct; it's just that the feature is no longer supported by the new kernel code, even though libbtrfsutil still exposes it. It would work with an older version of the kernel (presumably up to version 5.7).

ypoluektovich commented 3 years ago

At the moment the library returns the error as reported by libbtrfsutil (enum btrfs_util_error). How do you suggest integrating the errno as well in the reported errors?

Option 1: add a cause to LibError and fill it with nix::errno::Errno (as returned by e. g. Errno::last()) (which conveniently implements Error). So it'd be a "btrfs error such-and-such" caused by "system error such-and-such".

Option 2: make LibError a composite type which has both a btrfs_util_error value and an Errno. Would require a nontrivial Debug implementation, but it shouldn't be too complex.

In any case, be sure to handle the case of errno 0 (aka "no error") :)