destination-earth / DestinE_ESA_GFTS

Global Fish Tracking Service - DestinE DESP Use Case
https://destination-earth.github.io/DestinE_ESA_GFTS/
Apache License 2.0
10 stars 6 forks source link

enable browsing s3 via jupyter-fs #13

Closed minrk closed 7 months ago

minrk commented 7 months ago

@tinaok this adds an S3 browser to the JupyterLab sidebar:

Screenshot 2024-04-03 at 14 22 32

@yuvipanda this is what I mentioned to you yesterday. It seems to work fine with JupyterLab 4, but I had to do some shenanigans to work around https://github.com/PyFilesystem/s3fs/issues/70 because our files are not created with S3FS (they are created with s3fs), and S3FS makes some hard assumptions that it's created everything it might read (namely that an empty Object exists representing each directory level, which is not true in general). I did the definitely-totally-fine thing of catching the error that raises when a directory lacks a corresponding Object and making those empty objects if they are missing.

yuvipanda commented 7 months ago

our files are not created with S3FS (they are created with s3fs)

this is beautiful haha

yuvipanda commented 7 months ago

Hmm, so this just doesn't recognize 'directories' unless a specific empty object exists? So it probably won't work for readonly data buckets that don't do that?

minrk commented 7 months ago

So it probably won't work for readonly data buckets that don't do that?

My exact workaround won't, though you could do a read-only version of my workaround where it returns a fake Info model as if it were a real one instead of creating the object and trying again. The advantage of my version is it will only take the fallback path one time for any given missing directory, and will do the 'right' thing forever after.

minrk commented 7 months ago

This one appears to work for read-only:

import fs.errors
from fs.info import Info, ResourceType
from fs_s3fs import S3FS

class EnsureDirS3FS(S3FS):
    def getinfo(self, path, namespaces=None):
        try:
            return super().getinfo(path, namespaces)
        except fs.errors.ResourceNotFound as e:
            # workaround https://github.com/PyFilesystem/s3fs/issues/70
            # check if it's a directory with no corresponding Object (not created by S3FS)
            # scandir/getinfo don't work on missing directories, but listdir does
            # if it's really a directory, return stub Info instead of failing
            try:
                self.listdir(path)
            except fs.errors.ResourceNotFound:
                raise e from None
            else:
                # return fake Info
                # based on S3FS.getinfo handling of root (`/`)
                name = path.rstrip("/").rsplit("/", 1)[-1]
                return Info(
                    {
                        "basic": {
                            "name": name,
                            "is_dir": True,
                        },
                        "details": {"type": int(ResourceType.directory)},
                    }
                )
yuvipanda commented 7 months ago

@minrk this is great!

How do you control the list of buckets that show up here?

minrk commented 7 months ago

We're only working with one bucket. I think you need to explicitly list each bucket you want to mount in the resources config.

Ours is here. The bucket name (or arbitrary subdir) is in the mount.

yuvipanda commented 7 months ago

Interesting! Was https://github.com/destination-earth/DestinE_ESA_GFTS/pull/13/files#diff-96599d676c72313e9986285fd7ab9d14b18d8bec6167a33056b85ad4d2529435R101 needed as well, even if you only want the sidebar to show up?

minrk commented 7 months ago

Yes, the listing requests use the contents API at special drive:/... subdirectories. A default FileContentsManager still serves the "root" so there's no noticeable effect on regular file UI. I think you can still specify the root contents manager class if you need to.

I'm not 100% sure why this is implemented by overriding ContentsManager rather than replicating the API on a different endpoint.