danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

Fix prefix stripping #94

Closed ericdill closed 4 years ago

ericdill commented 4 years ago

Only the one line really "matters". everything else was to aid in the debugging of wtf was going on.

-        prefix = self.bucket
+        prefix = self.bucket.lstrip("s3://")

So what am I fixing here? Note two things. (1) the timestamps are all the default timestamp of 50 years ago and (2) it thinks my s3 directory is a file so if I click on it, jupyter notebook vomits up the 404 : Not Found You are requesting a page that does not exist!" page. image

Clearly that's not what we want. Doing a bit of debugging in the logs, I see the following being spit out:

[D 13:57:46.346 NotebookApp] 200 GET /api/sessions?_=1589982508227 (99.92.55.67) 0.89ms
[D 13:57:46.347 NotebookApp] 200 GET /api/terminals?_=1589982508228 (99.92.55.67) 1.05ms
[D 13:57:46.372 NotebookApp] S3contents.GenericManager.get] path('') type(directory) format(None)
[D 13:57:46.372 NotebookApp] S3contents.GenericManager._get_directory: path('') content(1) format(None)
[D 13:57:46.373 NotebookApp] S3contents.GenericManager._directory_model_from_path: path('') type(1)
[D 13:57:46.373 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: []
[D 13:57:46.374 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER` is a directory: True
[D 13:57:46.374 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: []
[D 13:57:46.374 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['.s3keep']
[D 13:57:46.422 NotebookApp] S3contents.GenericManager.dir_exists: path('')
[D 13:57:46.423 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: []
[D 13:57:46.443 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER` is a directory: True
[D 13:57:46.444 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: []
[D 13:57:46.444 NotebookApp] S3contents.S3FS.ls: Listing directory: `s3://BUCKET/notebooks/NO_USER`
[D 13:57:46.463 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/.s3keep', 'BUCKET/notebooks/NO_USER/Untitled Folder', 'BUCKET/notebooks/NO_USER/Untitled.ipynb']
[D 13:57:46.464 NotebookApp] S3contents.GenericManager.dir_exists: path('BUCKET/notebooks/NO_USER/Untitled Folder')
[D 13:57:46.464 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/Untitled Folder']
[D 13:57:46.465 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled Folder` is a directory: False
[D 13:57:46.465 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/Untitled Folder']
[D 13:57:46.466 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled Folder` is a file: False
[D 13:57:46.467 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/Untitled.ipynb']
[D 13:57:46.467 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled.ipynb` is a file: False
[D 13:57:46.468 NotebookApp] 200 GET /api/contents?type=directory&_=1589982508229 (99.92.55.67) 96.96ms

Pulling out some of the interesting lines, we see the following:

[D 13:57:46.374 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER` is a directory: True
[D 13:57:46.443 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER` is a directory: True
[D 13:57:46.465 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled Folder` is a directory: False
[D 13:57:46.466 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled Folder` is a file: False
[D 13:57:46.467 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled.ipynb` is a file: False

So it seems that sometimes we're getting the directory correct and sometimes we're also doubling up on the bucket/key.

The additional logging in unprefix is what helps to explain what's going on. Here we see that unprefix is not working correctly as it's returning the following: s3://BUCKET/KEY/BUCKET/KEY

[D 13:57:46.464 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/Untitled Folder']
[D 13:57:46.465 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled Folder` is a directory: False

[D 13:57:46.465 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/Untitled Folder']
[D 13:57:46.466 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled Folder` is a file: False

[D 13:57:46.467 NotebookApp] S3FS.unprefix: self.prefix_: s3://BUCKET/notebooks/NO_USER path: ['BUCKET/notebooks/NO_USER/Untitled.ipynb']
[D 13:57:46.467 NotebookApp] S3contents.S3FS: `s3://BUCKET/notebooks/NO_USER/BUCKET/notebooks/NO_USER/Untitled.ipynb` is a file: False

So on to my hacky fix. If I strip the s3:// from the return of get_prefix, then everything seems to return to normal and we get the correct timestamps and the s3 directories are properly clickable and act as folders.

Is this good enough? Are there side effects that I'm not thinking of by stripping off s3:// from get_prefix so that unprefix works properly?

ericdill commented 4 years ago

Ok ignore ALL of this so far. User error :facepalm: :disappointed_relieved:

In my config, I had "bucket": "s3://BUCKET_NAME". So what I'll do instead is strip off the s3:// on S3ContentsManager initialization

danielfrg commented 4 years ago

lololol, this looks ok to me anyway. I little better UX.

ericdill commented 4 years ago

Ok. Now the logs show this when the novice user (i.e., me) provides bucket="s3://BUCKET" in the jupyter config:

[D 18:55:29.399 NotebookApp] s3manager._validate_bucket: User provided bucket: s3://BUCKET
[W 18:55:29.400 NotebookApp] s3manager._validate_bucket: Assuming you meant BUCKET for your bucket. Using 
that. Please set bucket=BUCKET in your jupyter_notebook_config.py file                                    
ericdill commented 4 years ago

Lol. make check and make fmt don't agree:

$ ~/dev/s3contents make check
./s3contents/s3manager.py:84:65: W291 trailing whitespace
./s3contents/s3manager.py:91:10: W291 trailing whitespace
./s3contents/s3manager.py:93:1: W293 blank line contains whitespace
./s3contents/s3manager.py:98:1: W293 blank line contains whitespace
make: *** [Makefile:61: check] Error 1
$ ~/dev/s3contents make fmt
Skipped 1 files
All done! ✨ 🍰 ✨
16 files left unchanged.