danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

S3ContentsManager dask error on config initialization #75

Closed jpugliesi closed 5 years ago

jpugliesi commented 5 years ago

I've created configuration to use the S3ContentsManager, but am receiving an error that seems to be related to an incompatibility between dask and an old gcsfs version.

Here's the s3contents configuration:

jovyan@jupyter-jovyan:~$ cat /etc/jupyter/jupyter_notebook_config.py
import os
from s3contents import S3ContentsManager

user = os.getenv('JUPYTERHUB_USER', 'jovyan')
aws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']
aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
aws_default_region = os.environ['AWS_DEFAULT_REGION']

# Tell Jupyter to use S3ContentsManager for all storage.
c.NotebookApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.bucket = "my-bucket"
c.S3ContentsManager.prefix = os.path.join("jupyterhub", user)
c.S3ContentsManager.access_key_id = aws_access_key_id
c.S3ContentsManager.secret_access_key = aws_secret_access_key
c.S3ContentsManager.endpoint_url = "https://s3.us-west-2.amazonaws.com"

And here's the error upon running jupyter notebook:

[E 17:52:29.615 NotebookApp] Exception while loading config file /etc/jupyter/jupyter_notebook_config.py
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 562, in _load_config_files
        config = loader.load_config()
      File "/opt/conda/lib/python3.7/site-packages/traitlets/config/loader.py", line 457, in load_config
        self._read_file_as_dict()
      File "/opt/conda/lib/python3.7/site-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict
        py3compat.execfile(conf_filename, namespace)
      File "/opt/conda/lib/python3.7/site-packages/ipython_genutils/py3compat.py", line 198, in execfile
        exec(compiler(f.read(), fname, 'exec'), glob, loc)
      File "/etc/jupyter/jupyter_notebook_config.py", line 2, in <module>
        from s3contents import S3ContentsManager
      File "/opt/conda/lib/python3.7/site-packages/s3contents/__init__.py", line 15, in <module>
        from .gcsmanager import GCSContentsManager
      File "/opt/conda/lib/python3.7/site-packages/s3contents/gcsmanager.py", line 8, in <module>
        from s3contents.gcs_fs import GCSFS
      File "/opt/conda/lib/python3.7/site-packages/s3contents/gcs_fs.py", line 3, in <module>
        import gcsfs
      File "/opt/conda/lib/python3.7/site-packages/gcsfs/__init__.py", line 4, in <module>
        from .dask_link import register as register_dask
      File "/opt/conda/lib/python3.7/site-packages/gcsfs/dask_link.py", line 56, in <module>
        register()
      File "/opt/conda/lib/python3.7/site-packages/gcsfs/dask_link.py", line 51, in register
        dask.bytes.core._filesystems['gcs'] = DaskGCSFileSystem
    AttributeError: module 'dask.bytes.core' has no attribute '_filesystems'

I found a similar issue here: https://github.com/nteract/papermill/issues/342 - which was resolved by pinning the gcsfs module to gcsfs>=0.2.1.

Note that s3contents is pinned to gcsfs==1.2.0: https://github.com/danielfrg/s3contents/blob/master/requirements.txt#L6

I can confirm that the configuration successfully loads once I pinned to a more recent gcsfs version:

jovyan@jupyter-jovyan:~$ pip install 'gcsfs>=0.2.1'
jovyan@jupyter-jovyan:~$ jupyter notebook
... (Successfully starts notebook server)

Thoughts on pinning gcsfs>=0.2.1 version for s3contents?

danielfrg commented 5 years ago

I think that would be ok, I had it hard pinned since gcsfs changed the API so often. Can you make a PR? If the tests pass i am ok having it like that.

thbeh commented 5 years ago

Hi, i am using Dask to read CSV from S3 and looks like it requires s3fs >= 0.3.0

RuntimeError: 's3fs=0.1.5' is installed, but version '0.3.0' or higher is required. Any changes of bumping s3fs to >= 0.3.0 or I can do myself?