danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

Why are s3fs and gcsfs pinned to old versions in the requirements? #78

Closed milutz closed 4 years ago

milutz commented 5 years ago

Both s3fs and gcsfs are pinned in the requirements to versions more then a year old. Any particular reason why?

danielfrg commented 5 years ago

The main reason is that the APIs were changing pretty fast and I didnt had time to keep updating the code with those changes.

I would be happy to merge a PR that unpins them and fixes whatever the new APIs have.

milutz commented 5 years ago

@danielfrg Coolness! (and many thanks for the quick reply!)

I'm fighting a problem in my own code that I think may be fixed by newer versions - once I figure that out (if its true), I'll circle back and submit a PR for the change - Thanks for the background!!

milutz commented 5 years ago

... so found a wrinkle with dask/s3fs newer than 0.2.2, the "info" call, done in s3_fs.py passes refresh=True (such as https://github.com/danielfrg/s3contents/blob/master/s3contents/s3_fs.py#L123 ), which is listed as a valid arg in the dash/s3fs docs (https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem.info) , but isn't valid on dask/s3fs 0.3.0 or newer by code ( I.E. https://github.com/dask/s3fs/blob/0.3.0/s3fs/core.py#L441 ) Still digging to see what to do about it

This issue might be relevant : https://github.com/dask/s3fs/issues/207

thbeh commented 5 years ago

Hi, I got the error message too when I try putting s3fs==0.2.2. Looking at s3fs code base I see quite a load of refactoring in s3fs so not being a pro at this I hope someone could pick it up and make some changes so that it can be compatible working with dask.

milutz commented 5 years ago

@thbeh I just now built with s3fs==0.2.2 and gcsfs>=0.3.0 from my test fork ( https://github.com/milutz/s3contents/tree/patch-1 ) and I did get something that would let me startup jupyter with c.NotebookApp.contents_manager_class = S3ContentsManager

I haven't done much testing yet, but at least I can now startup. I'll play around a bit and see if anything explodes - and please post if you know of a problem.

I need to get something running now, so I'm trying this quick-fix of downgrading, but I suspect the real fix will be along the lines of what is in the dask issue I pointed at describes ( https://github.com/dask/s3fs/issues/207 ) - its offering that you can call invalidate_cache() in place of doing the refresh=True - its unclear to me if they are complete the same thing though.

@danielfrg you have any insight from when you originally added the refresh=True on the s3_fs.py calls?

thbeh commented 5 years ago

@milutz using s3fs==0.2.2 did work for me too but errors only when I try using it in dask environment where dask requires(deploy) s3fs >=0.3.0 to work with s3.

milutz commented 5 years ago

@thbeh Ah, that makes sense. Yes, I'll run into that problem soon too. I'll likely have to look the invalidate_cache() direction, unless @danielfrg knows of some reason that's not a good fix

DstephanAdvalo commented 4 years ago

Hi, I'm stuck with this s3fs version too do you have an idea of when you want to fix this point? Thank you for your work!

ophiry commented 4 years ago

A potential workaround i'm using is to create two separate virtual environments - one for jupyter (with the pinned s3fs version) and one for the kernel (with newer versions)

zac-yang commented 4 years ago

I made a PR for supporting newer versions of s3fs. However it seems that s3fs>=0.3 requires python3, causing travis CI to fail. FYI @danielfrg https://github.com/danielfrg/s3contents/pull/86

danielfrg commented 4 years ago

fixed on new versions