Open cjacksudo opened 5 years ago
This an interesting issue, it wont work on a notebook since the python file will be on S3 and not in the session/kernel path where they can be imported.
One way to make this work would be to have the python files locally and use the HybridContentsManager but this wont save the python files to S3.
A possibly nicer approach is to use boto3
in a notebook to download all files from a given S3 path locally, which ultimately achieves this. It should be possible to connect this action to a post-save hook too and automatically download files when they are saved (I haven't done that though, the occasional manual sync works for now).
It works roughly like this (assuming we want to sync files from a utils
folder, also, paths are specific to AWS EMR):
import os
import boto3
NOTEBOOK_BUCKET = "name-of-the-bucket-which-holds-your-notebooks"
UTILS_PATH = "jupyter/user_name/utils/"
s3 = boto3.resource("s3")
print(f"Loading '{NOTEBOOK_BUCKET}'")
bucket = s3.Bucket(NOTEBOOK_BUCKET)
for obj in bucket.objects.filter(Prefix = UTILS_PATH):
path = obj.key.replace(UTILS_PATH, "")
if os.path.basename(path).startswith("."):
continue
path = f"utils/{path}"
print(f"Downloading '{path}'")
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))
bucket.download_file(obj.key, path)
print("Done!")
Note that we skip files starting with a dot (like the .s3keep
files).
Nevertheless, it would be very nice to have an option for turning on "automatic local syncing upon save" (at least for .py files).
When I'm using the local file system, it's possible to add my home directory to my python path such that in my notebook I can upload a python file to my root directory and then run a line like:
However, when using s3Contents, I get an import error. Interestingly, if I run the following, it looks like the file is in my home directory already:
However, if I actually look at that directory, the file is missing...
Is there a way to make this import work?