archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
535 stars 77 forks source link

Run as NBD server instead of FUSE server #147

Closed Nikratio closed 2 years ago

Nikratio commented 3 years ago

As the manpage says, "s3backer should really be implemented as a device rather than a filesystem. However, this would require writing a kernel module instead of a simple user-space daemon, because Linux does not provide a user-space API for devices like it does for filesystems with FUSE".

It so happens that this facility exists in form of NBD. I would like to try to make NBD support available as an optional alternative mode of operation for s3backer.

I'm filing this bug mainly for tracking purposes to avoid duplicate work, and to give people one place to voice concerns / objections / ideas.

Nikratio commented 3 years ago

I think when running as an NBD server, it may make sense to force the s3backer block cache to reside on permanent storage.

With NBD, we are not limited to processing write requests sequentially. Rather, we will receive a bunch of concurrent writeback requests. This happens either as a result of an explicit fsync, background writeback, or due to system memory pressure.

In the first two cases, having the s3backer cache in memory means that the data is now stored twice - wasting memory that could otherwise be used to cache other data.

In the third case, we are freeing space in the the kernel's page cache, but at the same time requiring the memory usage of the s3backer userspace process. This is arguably somewhat better (userspace can be swapped out), but is not great either.

As far as I can tell, there is no advantage of putting the cache in memory either. For writes, the data will end up in the pagecache for the s3backer cache file first anyway, so there is no speed difference. For reads, it will be faster to read from memory than from disk. But a read request will only be issued to s3backer when the data has been evicted from the kernel's page cache - in other word, if the system is under memory pressure. So if we hadn't put the s3backer cache into memory, then we wouldn't have had to evict the data from the page cache in the first place.

Am I missing something?

archiecobbs commented 3 years ago

Hi @Nikratio ,

The in-memory cache is there mainly because it was developed first. Then, later, the on-disk cache was added as well.

I never use the in-memory version myself and I agree that it's probably not appropriate to be using it with NBD.

Rather than disallowing this configuration we could just advise against it in the documentation for NBD mode.

archiecobbs commented 3 years ago

Looks like a logical way to do this would be to write an nbdkit plugin.

Developer docs: https://libguestfs.org/nbdkit-plugin.3.html

Nikratio commented 2 years ago

Just pinging this as an FYI that I'm working on it.

Nikratio commented 2 years ago

MVP is now available in https://github.com/Nikratio/s3backer/tree/nbd.

There's still a fair number of things to be done, but this is sufficient to provide /dev/ndbX, and I was able to create and use a filesystem on it. Documentation is in https://github.com/Nikratio/s3backer/blob/nbd/README_nbd.md.

Comments and early feedback appreciated. I will provide a pull request when I have resolved (most) of the TODOs in the code comments. However, I will need help to sort out the Autoconf-config.

archiecobbs commented 2 years ago

FYI to anyone interested in this. Work is currently underway on branch nbd.

Nikratio commented 2 years ago

Uhm, I have been continuing to work on the pull request because it hadn't been merged yet (as far as I could see).

I guess I should rebase the changes on the nbd branch then? It's a bit tricky because the commit that you based the nbd branch on has since been amended.

archiecobbs commented 2 years ago

I had to make some changes that also needed to go into master, etc., and did several other code cleanups/refactorings. For example calculate_boundary_info() in 0ca29201 to clean up your patch. Also your patch used different coding indentation, etc.

Not trying to be NIMBY, it was just easier for me to deal with this way.

archiecobbs commented 2 years ago

Can you rebase your latest changes against the nbd branch?

Nikratio commented 2 years ago

I've just tried rebasing on the nbd branch, but it's a huge mess. Basically every change is a conflict because of the reformatting. Is there a way to do the reformatting on my pull request first (e.g. with a specific clang-format run)? Then I could do that and hopefully have a cleaner merge afterwards. If not, then I'm sorry but I do not have the energy and motivation to sort this out.

Nikratio commented 2 years ago

(I don't think there's anything wrong with doing the kind of refactorings and reformatting that you did. The problem is that you didn't tell me that you effectively merged the code already so to me it looked as if you'd just started reviewing it and I continued to refine it instead of working from your new branch).

archiecobbs commented 2 years ago

I'm happy to clean it up.

You've made some changes since my branch was created from your branch and these need to be factored back in. My branch is on the left most line below:

   ____ my nbd branch
 /
|    ____ my master branch
|  /
| |
* | commit 59a42fc63d8f9c660cedb209214a2e328ed2dd3d
| | Author: Nikolaus Rath <Nikolaus@rath.org>
| | Date:   Fri Apr 22 21:07:57 2022 +0100
| | 
| |     Add support for running as nbdkit plugin.
| |     
| |     This enables s3backer to be used as a nbdkit plugin.
| |     
| |     Fixes: #147
| |   
| | * commit b49dd299400d6ba509232b2e85a25aba0f763c8e (refs/remotes/Nikratio/nbd)     <--- your nbd branch
| | | Author: Nikolaus Rath <Nikolaus@rath.org>
| | | Date:   Fri Apr 29 21:08:59 2022 +0100
| | | 
| | |     Don't run external command to determine nbdkit plugin directory.
| | | 
| | * commit 4b6c0336e5f1ef2ba5083cc6e382cbbe1ed83eeb
| | | Author: Nikolaus Rath <Nikolaus@rath.org>
| | | Date:   Thu Apr 28 13:07:29 2022 +0100
| | | 
| | |     Added example script to create NBD-based zpool.
| | | 
| | * commit 9ac0ccb4f9552e0dff9234dcc4d423a15e3788cb
| | | Author: Nikolaus Rath <Nikolaus@rath.org>
| | | Date:   Thu Apr 28 19:37:10 2022 +0100
| | | 
| | |     Enable passing of arbitrary s3backer arguments.
| | | 
| | * commit ae5b999c5acd935b205cd6787bcb145254d86fb4
| |/  Author: Nikolaus Rath <Nikolaus@rath.org>
|/|   Date:   Fri Apr 22 21:07:57 2022 +0100
| |   
| |       Add support for running as nbdkit plugin.
| |       
| |       This enables s3backer to be used as a nbdkit plugin.
| |       
| |       Fixes: #147
| | 
* | commit 13d824ba03adc1ba91c3525bd0622ed19e78181b

Not sure how you duplicated 59a42fc6 into ae5b999c but their difference is trivial. So the main work is to rebase 9ac0ccb4, 4b6c0336, and b49dd299 onto my nbd branch.

To that point...

nbdkit.c

I think passing a single configFile=xxx flag (or even just xxx thanks to magic_config_key) to nbdkit(1) is a cleaner approach than putting all the flags on the command line with s3b_ prefixes. Your thoughts?

Otherwise, there are no real functional differences between your nbdkit.c and mine (except that I added a can_cache callback; also the trim and zero callbacks to use parallel bulk delete).

create_zpool.py

This is a new file you added, so no conflict.

Those are all the changes/conflicts I see...

Nikratio commented 2 years ago

Ok, that helps. Sorry that I was a bit grumpy yesterday. I will rebase my work on the nbd branch on your repository.

Note that this currently does not build:

FAILED: nbdkit-s3backer-plugin.so.p/nbdkit.c.o 
cc -Inbdkit-s3backer-plugin.so.p -I. -I.. -I/usr/include/x86_64-linux-gnu -I/usr/include/fuse -fdiagnostics-color=always -pipe -Wall -Winvalid-pch -Wextra -g -DHAVE_CONFIG_H -Wno-sign-compare -Wno-unused-parameter -Wstrict-prototypes -Wmissing-declarations -Wwrite-strings -fno-strict-aliasing -Wno-unused-result -fPIC -D_FILE_OFFSET_BITS=64 -pthread -MD -MQ nbdkit-s3backer-plugin.so.p/nbdkit.c.o -MF nbdkit-s3backer-plugin.so.p/nbdkit.c.o.d -o nbdkit-s3backer-plugin.so.p/nbdkit.c.o -c ../nbdkit.c
../nbdkit.c:133:6: error: ‘struct nbdkit_plugin’ has no member named ‘cleanup’
  133 |     .cleanup=               s3b_nbd_plugin_cleanup,
      |      ^~~~~~~
../nbdkit.c:133:29: warning: initialization of ‘int64_t (*)(void *)’ {aka ‘long int (*)(void *)’} from incompatible pointer type ‘void (*)(void)’ [-Wincompatible-pointer-types]
  133 |     .cleanup=               s3b_nbd_plugin_cleanup,
      |                             ^~~~~~~~~~~~~~~~~~~~~~
../nbdkit.c:133:29: note: (near initialization for ‘plugin.get_size’)
../nbdkit.c:133:29: warning: initialized field overwritten [-Woverride-init]
../nbdkit.c:133:29: note: (near initialization for ‘plugin.get_size’)

I'll include a fix in the rebased work.

archiecobbs commented 2 years ago

It (i.e., 1.6.3-65-g00ffc1b) compiles fine for me. We must have different NBDKit versions installed.

Here's what I have:

$ rpm -q nbdkit-devel
nbdkit-devel-1.29.4-150300.8.6.1.x86_64