Closed pvanliefland closed 4 years ago
Hey @ericdill, obviously a work in progress, but wanted to create the PR so you can have a look.
Couldn't find a way to run the tests on my laptop so the PR will help with that too.
Couple of things that are failing.
you can update the test matrix in .github/workflows/test.yml
Ok, so a bit ashamed but as I couldn't find a way to run the tests locally, I use the Github actions to make sure that the tests pass.
I will of course rebase those ugly commits.
I think it is ready for another review @ericdill - I have added some tests and a basic check for staled chunked uploads.
The lint fails, I don't understand the issue with isort, on my local machine it does not generate any error...
In the meantime I will test it using a real Jupyter environment.
Ok I'll have a look later today.
To run tests locally, I think you need to do the following:
make env
conda activate s3contents
make minio
(after conda activate s3contents)make test-all
(after conda activate s3contentsYeah I'm not sure what's up with the lint. Master branch has an error that I can't reproduce locally. Might be a different python version locally vs GH actions. I'll poke at that later today.
Ok, tested the fork in a "real" project and it seems to work
Hi @pvanliefland @danielfrg,
Regarding backwards compat for python 3.6 specifically, I'm torn. On the one hand, supporting more python versions would be good. On the other, silent failures on file uploads is bad.
Is silent failures on uploads bad enough that we want to drop python 3.6 compatibility? Or should we maintain python 3.6 compat with the understanding that file uploads is broken in py3.6?
edit: Does this block show an error to the user if they try to upload a file?
Once this PR is in, you might consider submitting a PR to the chunked-saving part of the notebook docs that references this as an example implementation of chunked saving.
Is silent failures on uploads bad enough that we want to drop python 3.6 compatibility?
I am also torn but I think that is bad enough, at least on my book. I am also in general happy to just drop things without thinking much about it lol, but I can be convinced otherwise
@danielfrg is might not be a silent failure? Looks like this code path will 500. I wonder if that's seen by the user.
if not CHUNKS_HANDLING:
self.log.error(
"S3contents.GenericManager._save_large_file: not available with Python < 3.7"
)
self.do_error("Chunked file upload in S3/GCS requires Python >= 3.7", 500)
Ah, i see. If we log it and show it to the notebook UI then I think we should keep Python 3.6.
@ericdill @danielfrg unfortunately, I've tested how errors are handled and it seems that although they appear in the console, they are not shown displayed to the user.
It means that in the context of chunked uploads, the file won't be uploaded at all (compared to now, where a truncated version of the file is uploaded), no message will be displayed, but the browser console prints a meaningful message.
Let me know how you want to proceed.
Thanks for all the work on this!
In that case my vote would be to drop python 3.6. If you agree @ericdill then we can just do that.
yea, agreed. let's say farewell to python 3.6. continued silent failures on uploads is not a great user experience
I just tried to use your branch and things worked as expected 🎉 when the folder in the contents manager was a path to an s3 bucket. When I tried to upload to a directory backed by a FileContentsManager, the same behavior happens (truncated uploads). I suspect we should add some guidance to the README here in this repo that you should consider using the LargeFileManager
instead of the FileContentsManager
if you're having trouble with uploads.
@ericdill @danielfrg Ok, this should be ready to merged.
Latest changes:
LargeFileManager
instead of FileContentsManager
(LargeFileManager
has been the default manager in Jupyter notebook for a while now)Thanks for the pr @pvanliefland ! Going to assume @danielfrg is ok with this :) We can always fix any issues @danielfrg has in a follow-on PR.
Cool!
This PR addresses #80
This implementation uses
contextvars
to store the chunks in-memory until the upload of the last chunk (at this point we send the file to S3 or GCS).Note: using contextvars requires Python 3.7.
It should work for both S3 and GCS.
Todo:
chunks
utility modulesave()
in generic managercontextvars
in try/except and disable chunks for Python < 3.7