danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

Ques: On s3 url #111

Closed daddydrac closed 3 years ago

daddydrac commented 3 years ago

In the file https://github.com/danielfrg/s3contents/blob/master/s3contents/s3manager.py, on line 20 you have:

endpoint_url = Unicode("https://s3.amazonaws.com", help="S3 endpoint URL").tag(
        config=True, env="JPYNB_S3_ENDPOINT_URL"
    )

Is this the part of the URL you use to connect to the s3 bucket in order to read/write to it?

If not, where does that code exist? Please advise.

daddydrac commented 3 years ago

@ericdill could you plz triage this question, i am very thankful for your time answering this in advance.

ericdill commented 3 years ago

What problem are you currently having? What config have you set in your jupyter_notebook_config.py?

daddydrac commented 3 years ago

@ericdill: I’ll be honest, I can’t talk about it in the open due to security constraints and federal laws. I just need to know where and how you’re generating the string that connects to the bucket itself. I wish I could share any part of the code but I would be in a lot of trouble.

daddydrac commented 3 years ago

@ericdill I am trying to configure a special type of s3 for gov't. I just need an explanation on how you are generating the url/connection bucket strings.

ericdill commented 3 years ago

Have you looked through this codebase to see how and where endpoint_url is being used? Ultimately this code is a wrapper around dask/s3fs which itself is using aiobotocore to handle all of the interacting with the s3 APIs, so all we're doing here is passing args down to those libraries. If you are needing to understand exactly how those connection strings are being formed then you'll probably need to explore the aiobotocore library to figure that one out. I'm not particularly familiar with how those strings are being formatted

daddydrac commented 3 years ago

@ericdill: Going back to the code -> _endpoint_url = Unicode("https://s3.amazonaws.com", help="S3 endpoint URL")_

is https://s3.amazonaws.com a hard coded string that is used statically throughout?

ericdill commented 3 years ago

endpoint_url is a variable that is passed to the s3fs library which is then passed to aiobotocore to actually initiate a connection to whatever s3 provider you're using. If you want to use a different s3 endpoint then provide a different value for the endpoint_url in your jupyter_notebook_configuration.py. If you look at the readme you'll see a bunch of things being set:

from s3contents import S3ContentsManager

c = get_config()

# Tell Jupyter to use S3ContentsManager for all storage.
c.NotebookApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.access_key_id = "{{ AWS Access Key ID / IAM Access Key ID }}"
c.S3ContentsManager.secret_access_key = "{{ AWS Secret Access Key / IAM Secret Access Key }}"
c.S3ContentsManager.session_token = "{{ AWS Session Token / IAM Session Token }}"
c.S3ContentsManager.bucket = "{{ S3 bucket name }}"

# Optional settings:
c.S3ContentsManager.prefix = "this/is/a/prefix/on/the/s3/bucket"
c.S3ContentsManager.sse = "AES256"
c.S3ContentsManager.signature_version = "s3v4"
c.S3ContentsManager.init_s3_hook = init_function  # See AWS key refresh

If you add

c.S3ContentsManager.endpoint_url = whatever_url_you_want

then things will probably work. I assume you've already tried this and it didnt work?

daddydrac commented 3 years ago

Yes @ericdill I did try that. As for endpoint_url is this an access endpoint to perform bucket operations, or is it the URL to the bucket itself?

** I am also of the opinion that this does not work on AWS GovCloud, at this point.

daddydrac commented 3 years ago

Got it working and here is the answer: https://github.com/dask/helm-chart/pull/78#issuecomment-737403661

*I'll close this as well now, thank you for your help @ericdill !!!