danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

Adding S3 contents to python path #56

Open cjacksudo opened 5 years ago

cjacksudo commented 5 years ago

When I'm using the local file system, it's possible to add my home directory to my python path such that in my notebook I can upload a python file to my root directory and then run a line like:

from my_uploaded_file import *

However, when using s3Contents, I get an import error. Interestingly, if I run the following, it looks like the file is in my home directory already:

In:
     import os
     file_path = os.path.abspath("my_uploaded_file.py")
     print(file_path)
Out:
   '/home/jovyan/my_uploaded_file.py'

However, if I actually look at that directory, the file is missing...

Is there a way to make this import work?

danielfrg commented 5 years ago

This an interesting issue, it wont work on a notebook since the python file will be on S3 and not in the session/kernel path where they can be imported.

One way to make this work would be to have the python files locally and use the HybridContentsManager but this wont save the python files to S3.

GergelyKalmar commented 3 years ago

A possibly nicer approach is to use boto3 in a notebook to download all files from a given S3 path locally, which ultimately achieves this. It should be possible to connect this action to a post-save hook too and automatically download files when they are saved (I haven't done that though, the occasional manual sync works for now).

It works roughly like this (assuming we want to sync files from a utils folder, also, paths are specific to AWS EMR):

import os

import boto3

NOTEBOOK_BUCKET = "name-of-the-bucket-which-holds-your-notebooks"
UTILS_PATH = "jupyter/user_name/utils/"

s3 = boto3.resource("s3")
print(f"Loading '{NOTEBOOK_BUCKET}'")
bucket = s3.Bucket(NOTEBOOK_BUCKET)
for obj in bucket.objects.filter(Prefix = UTILS_PATH):
    path = obj.key.replace(UTILS_PATH, "")
    if os.path.basename(path).startswith("."):
        continue
    path = f"utils/{path}"
    print(f"Downloading '{path}'")
    if not os.path.exists(os.path.dirname(path)):
        os.makedirs(os.path.dirname(path))
    bucket.download_file(obj.key, path)
print("Done!")

Note that we skip files starting with a dot (like the .s3keep files).

GergelyKalmar commented 3 years ago

Nevertheless, it would be very nice to have an option for turning on "automatic local syncing upon save" (at least for .py files).