djdv / go-filesystem-utils

ISC License
10 stars 2 forks source link

NFS host and guest support #40

Open djdv opened 1 year ago

djdv commented 1 year ago

Indirectly requested from https://github.com/ipfs/roadmap/issues/83 Creating a draft before the bot locks the thread.

This is still a WIP but basic functionality seems to work for both hosting and attaching. However, initial testing shows it is not reliable. Looking at the logs, there seems to be some kind of continuity problem in the NFS server or client where 2 lookup requests back to back will be processed in the wrong order. (Likely some kind of race condition.)

2023/11/21 13:21:51 [TRACE] request: RPC #4290061661 (nfs.Lookup)
2023/11/21 13:21:51 [TRACE] request: RPC #4290061662 (nfs.Lookup)
err on: Qmc2fSA6chjZMbXRi7JGQZ7tXEmDbtG2XF9FXRLCtGLBQV [] xid did not mat ch, expected: ffb5255e, received: ffb5255d
err on: Qmc2fSA6chjZMbXRi7JGQZ7tXEmDbtG2XF9FXRLCtGLBQV [] xid did not mat ch, expected: ffb5255d, received: ffb5255e

The error stems from this section of the client. and I noticed the server is handling sends on the socket within a goroutine (see references to this struct member). The issue could lie within the NFS libraries or my own implementations, so I'll have to investigate that when I get the chance.


Example server/host invocation: fs mount nfs pinfs /ip4/192.168.1.40/tcp/2049 I.e. fs mount nfs $guestAPI $nfsServerMaddr (Assuming an IPFS node is running on this machine with default settings. Otherwise pass -pinfs-api $ipfsAPIMaddr before the final multiaddr argument.)

Example guest/client invocation: fs mount fuse nfs -nfs-server /ip4/192.168.1.40/tcp/2049 I: I.e. fs mount $host nfs $nfsServerMaddr $mountPoint


CC: @willscott We spoke about utilizing this library for this purpose a very long time ago but I never got around to it until now. Letting you know about its existence. I'm planning on looking into the above error myself sometime, but you're more than welcome to hack around too. ;^]

willscott commented 1 year ago

Looking briefly at the client you linked I think that's right - the client is only coded to support 1 call occurring / outstanding at a time.

djdv commented 10 months ago

Had some time to look at this. Ended up just making the client lock between operations to prevent any overlapping requests. (And generally fixing up and finishing other parts.)

Still a few things left to do but the server and client seem to work. At least well enough for a smoke test. https://www.youtube.com/watch?v=19FkIxTzavY

willscott commented 10 months ago

try updating to the latest v0.0.2 of go-nfs and you may see the stale handle issue you ran into in the video go away.

This is very cool!

djdv commented 10 months ago

try updating to the latest v0.0.2 of go-nfs and you may see the stale handle issue you ran into in the video go away.

Did a quick test and it seems like it did fix that. :^] Some very minor changes to get it to build again, but afterwards the Go server and Linux client seemed to work alright.

However, https://github.com/willscott/go-nfs/pull/121 broke some stuff on my end. So I'm going to look into patching this branch to comply with the changes.

This is very cool!

Thanks!


Extra technical details on the breakage:

On the server side, within mountpoint.go:Mount(...), I'm basically taking an extended fs.FS and wrapping that in a billy.Basic implementation, which gets passed to polyfill, before being wrapped further with NFS specifics (NewNullAuthHandler+NewCachingHandler), and finally starts listening for connections.

The fs.FS implementations I've written, don't (yet) implement Lstat, so when I make target.Getattr calls from go-nfs-client[^1]), they're returning NFS3ERR_NOTDIR (specifically the logs say: [ERROR] call to 0x7ff6e6f6e0a0 failed: Not a directory: lstat ".": unsupported operation which is coming from the fs.FS Lstat extension function I have.[^2])

To remedy this I can just implement Lstat methods on all the fs.FS implementations I have. And also add a check for this up front in mountpoint.go:Mount, returning a clear error stating we require an Lstat method. This is related to the last part in https://github.com/djdv/go-filesystem-utils/issues/41 expressing the need for something similar to billy's Capabilities concept so that different abstraction layers can better interoperate.

[^1]: I do this in the fs.FS' Open method [^2]: polyfill's Lstat method also behaves this way.

djdv commented 10 months ago

Added symbolic link support to the IPFS fs.FS systems. Specifically Lstat, and Readlink; some systems can conditionally do Symlink (mklink) as well. Open should now implicitly try to resolve links, as long as they are relative to the fs.FS itself. Symlink support is not well tested yet, but these changes restore NFS support on go-nfs v0.2, which includes the fix for Linux's client. We could have got away with just doing Lstat, but proper symlink support had to happen eventually anyway.

Other link related changes were made in other parts of the code too (filesystem interface pkg + FUSE host). The NFS client implementation imposes similar link restrictions (as above). It will do a best effort inspection on link targets and try to resolve ones which are relative. There's a constructor option that allows setting a path delimiter token, so if you know your links will be delimited by a token other than / (such as somewhere\relative) you can construct the FS with an option that tells it to replace them in the target string. So that they get split properly instead of treated as a filename character (as opposed to a path separator)

When I get a chance I'm going to clean up some remaining TODOS, and do some testing with IPFS link nodes. After that should just need to review it and squash some commits.

willscott commented 10 months ago

happy to do a review pass when you're ready for it

djdv commented 9 months ago

This is probably good enough for a first pass. Will, I can't flag you as a reviewer, so I added you as a collaborator first. If you accept that I should be able to put you on this one. :^]

Some context on various packages:

/filesystem contains extensions to fs.FS, adding any functionality not present in the standard, with extension functions to match the interface extensions. All of our "guest" systems intend to implement parts of these interfaces.

Its subpkg /filesystem/errors essentially extends fs.PathError, adding a Kind type. Guests can define a general error "kind" which hosts APIs can inspect and translate into their native error values. E.g. an IPFS guest might return an fserror which can be picked up by a FUSE host, that can translate it into the specific errno value its API requires.

/nfs contains file system host & guest support utilizing the extensions in /filesystem. /ipfs are guest implementations that can be used by other host implementations. They've been adapted to utilize symlinks in this patchset.

All the other changes could be ignored, but you're welcome to look at them too if you want. Extra stuff below:


The general idea for /commands/mount.go is that it parses argv and transforms values into structures that hold parameters for the host and guest constructors. Those get marshaled and sent over the wire to a daemon process, which itself unmarshals the data, and constructs the guest fs.FS and passes it to the host instances Mount method. There are reasons for doing things that way, but the implementation is too complicated as-is. So I'm planning on refactoring it later to reduce some duplication and other funny business.

There will still be a need to keep "mountpoint" data serializable, but I think it can be done better than it currently is. 1 reason is that we need to send our CLI args to the daemon process, and 2 is that we need to be able to store the mountpoint on disk so that we can restore it later, like Unix's fstab. There's other interesting things you can do when the mount table is a living service in itself, such as changing instance parameters live, with or without needing to remount / migrate handles. As well as integrations with external tools, for example a context menu being able to know it's inside an IPFS mount point, and offer more functions like "copy CID", "copy gatewaylink" like an old shell-extension of mine Some of this already works: Desktop 2023.02.03 - 13.25.43.13.mp4, Desktop 2022.08.23 - 18.14.54.08.mp4) but isn't well documented since it's likely going to change.

djdv commented 9 months ago

I'm planning to look over these changes again when I get a chance, and afterwards will probably merge this in. If people use this and find bugs with it of course they can be patched later.

Will, thanks for the help on this. I appreciate you going beyond just the NFS parts too. I think we both agree that some of the parts of the codebase are complex and need some attention. I'm not sure yet how much it can be improved, but I'm going to at least investigate it and prototype some stuff. But that's going to be done in a separate PR later.

I'm in no rush to get this merged, so if you're still looking at the code, or plan on running some interactive tests against it, just let me know and I'll hold off on pressing the button.

willscott commented 9 months ago

I'm in no rush to get this merged, so if you're still looking at the code, or plan on running some interactive tests against it, just let me know and I'll hold off on pressing the button.

I wasn't planning to look at this again in the short term. Thanks for proving it out though, I'm excited it exists!

djdv commented 9 months ago

Sorry for the silence on this. Instead of reviewing it and merging it ASAP, I ended up trying to refactor a lot of the code related to mount. (I've also been tied up with unrelated troubles.)

I did end up doing a rather large refactor of the code base, but I'm not actually sure if it's an improvement. As such, I'm going to finish up that refactor, try to get this merged, then try to get the refactor merged after that.

Afterwards, I'm planning on recording a lengthy video orientation, with each section explaining each package of the code base, along with rationale for why it was done the way it is. The purpose of this is twofold, 1) it should help people to understand how the program works in case they want to modify it, and 2) act as an open invitation for suggestions. If someone watches it and thinks there's a better way to implement some part of the codebase, I'd be happy to hear it since this repo is rather complicated and I'm having a hard time reducing that complexity.

Personal note: I have to be gone for a few days but I'll act on this when I get back.


Edit (2024.03.19): Still working on a large refactor but I have to disappear for a few days again. Have a temporary commit up in this branch: j/mount-refactor, but it's probably not worth looking at yet. Didn't have time to split the commits up yet so it's one giant blob for now. I'll remedy that later.

I did port the FUSE host and the IPFS guests over to a newer style on that branch. And most of the command line flag stuff was redone as well. That branch is built on top of this one, but I still need to port the NFS host and guest support to the newer interfaces there.

Intended to have that finished this week, but got interrupted with something IRL again. Hopefully next week. When I posted previously, the current iteration of the refactor was not looking much better. But I think the one I landed on is probably somewhat better. Hard to gauge it - since I wrote it, it makes sense to me right now but we'll see in time if the code holds up in terms of clarity.