anowell / algorithmia-fuse

Experimental: FUSE-based Algorithmia FileSystem
1 stars 0 forks source link

No support fo connectors with path restrictions #1

Open anowell opened 8 years ago

anowell commented 8 years ago

Filesystems are hierarchical and nodes lookup depends on the existence of a parent node: The kernel has no way to lookup /foo/bar if /foo doesn't exist. This wasn't a problem until I got everything wired up to support connectors. In the current implementation path restrictions cause a problem: The API won't confirm the existence of directories that are above the path restriction (note: this isn't about listing connectors or listing restricted directories, rather just being able to check if a specific one exists.)

Short-term, I can support dropbox and s3 connectors only if the path restriction is an empty string ("*" does not work, as it prevents checking existence of the connector itself). And that's more than good enough to experiment, but in the event that most of the other functionality does come together, I'm curious what opinions others might have on lifting those limitations. I see a few options:

  1. Accept that the only way to use a connector from a mounted virtual filesystem is with an empty string path restriction (not the default restriction).
  2. Allow HEAD requests to connectors paths that still return 200 if the connector and directory exist. It doesn't require the ability to list (GET) said directory. Filesystems are perfectly happy to have unlistable directories, just not ENOENT parents. Fwiw, it seems a bit strange or not very HTTP-correct to have have a path where HEAD returns 200 but GET returns 404. (Alternatively GET could return 200 and empty list, but that seems like it could be more confusing to the normal API usage)
  3. Fake it! For paths under a connector, if the API returns 401 unauthorized (which is how path restriction errors bubble up), let the filesystem create a fake inode for it identifying it as an unlistable directory. So if you stat ~/algofs/dropbox/not/a/valid/path, it would claim to be a valid directory.
  4. The 401 Unauthorized could actually return the connector's path restriction, so that the filesystem can figure out exactly which directories to fake.

(1, 2, and 4 have varying degrees of security implications.)

pmcq commented 8 years ago

Seems to me like * not working to check existence of the connector is a bug that should be fixed, but that still leaves the issue of how to allow for a restriction other than *.

  1. While possible, we can do better than that and I wouldn't want to advertise an FS without it
  2. Might not be exactly HTTP-correct but i'm not sure if there's another 2xx status code that would make more sense - More on this later
  3. Could potentially work, though seems like you could build out a strange filesystem with a ton of unlistable directories (e.g. anyone else's data directory, dropbox paths before you've auth'd against dropbox?)
  4. I think this is also clever, and might make the data API more usable (i.e. you can see why your connector failed to auth). However I think it might become more problematic if/when we'd support multiple paths for a connector (i.e. authorize usage on /Pictures/Spring2016 and /Thumbnails/Spring2016 but nothing else)

So maybe a compromise between 2&4 could be - if the path exists or is "up" the path restriction then HEAD returns 200, otherwise the appropriate 404/401? E.g. path restriction is /photos/foo/bar, HEAD on /photos returns 200, but a GET returns 401. a HEAD on /p returns a 401 as would /photos/secret. I think it kind of makes sense from the filesystem perspective - I can cd to a directory /a/b/c even if I can't list /a/b.

anowell commented 8 years ago

So glad I asked for opinions - that sounds like a very solid alternative! The clever use of the path restriction even saves the API server from making an API call to the connector's backing service to find out if it exists. :+1:

(definitely puntable work until this project proves it's utility, but I just wanted to have an idea of how we could solve it, in particular because #3 had a few design implications for the filesystem inode storage)

anowell commented 8 years ago

Minor detail: it'd be nice to be able to distinguish listable from non-listable in that HEAD request.

e.g. if restriction is /photos/foo', when querying/photos, it needs added to the filesystem as non-listable (perms=0640), but when querying/photos/foo, it needs inserted as listable (perms=0750`). It could be determined by a followup GET request, but then listing a deep directory would basically require trying to list all of its ancestors. (None of this applies to regular data:// URIs because they are all listable.)