danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

If S3 root doesn't allow writing s3contents(and jupyter) will die with untrapped exception #59

Closed milutz closed 5 years ago

milutz commented 5 years ago

When s3contents is starts, specifically in the s3_fs.py code https://github.com/danielfrg/s3contents/blob/master/s3contents/s3_fs.py#L71-L74 it calls of series of functions that I believe are intended to verify the filesystem.

The first is a self.mkdir("") which seems to translate into trying to run a mkdir on the root directory itself (which, because S3 isn't a normal unix filesystem, really creates a file under root, specifically the path [bucketname]/.s3keep)

The problem comes in if you don't have policy enabling you to write [bucketname]/.s3keep, you end up with a dead Jupyter because of the untrapped exception botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied

I'm not completely sure what would be the most sane solve for this, but wrapping the calls https://github.com/danielfrg/s3contents/blob/master/s3contents/s3_fs.py#L72-L74 with exception handling, at least for botocore AccessDenied seems like it might be a good start.

I'm currently working around the problem by making a one-off policy for /.s3keep

pankajtaneja commented 5 years ago

Hi Mike,

I encountered the same issue and was able to bypass it by using prefix along with the bucket name. This creates user level folder in S3.

c.S3ContentsManager.prefix = "path/after/bucketname/" + '{username}'.format(username=os.environ['JUPYTERHUB_USER'])

PS: I was not able to configure Serverside encryption in S3 and my team wanted me to pass below headers. I had to fork the repo to make additional changes:

x-amz-server-side-encryption: aws:kms x-amz-server-side-encryption-aws-kms-key-id: ${AWS_S3_KMS_KEY_ARN} x-amz-acl: private

milutz commented 5 years ago

@pankajtaneja Good to know that approach works too - I have other subsystems (expectations of the Jupyter user env) that would be broken if I change the prefix.

For your P.S. I would assume that should become an issue of its own (or if your fix is complete and controllable by config file settings, that you should do a Push Request back to this repo) to make sure it doesn't get lost - that sure sounds useful

danielfrg commented 5 years ago

Thanks for the report and potential workarounds!

I dont think there is a way to "fix" it since as you mentioned the .s3keep file is something that is needed since S3 is not a real file system.

Having said that I think you are right about raising that a a better exception and even printing a policy people can copy/paste into IAM. Would you be able to paste the json of the policy you use here so we can share it with people?

Also would be more than happy to accept that PR to customize the headers @pankajtaneja!

Thanks!

milutz commented 5 years ago

The block of json I added to my AWS policy was:

        {
            "Sid": "HackForS3contentsKeepFile",
            "Action": [
                "s3:*"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::mybucket/.s3keep"
            ]
        }

with "mybucket" being the name of the bucket that will be effected - unsure if there is a better way to write it

milutz commented 5 years ago

@danielfrg I have a side question on the .s3keep files.

If found a SO talking about how the S3 shows empty folders, it claims "Amazon S3 Management Console, it actually creates a zero-length object with the name of the folder" (SO: https://stackoverflow.com/q/45207198/3334178 )

Is there a reason this won't work for s3contents? (unsure this would help my issue, but thought I would ask before I dug into research for that)