Closed PhE closed 1 year ago
I managed to access the S3 content from my notebook with the s3fs module :
s3 = s3fs.S3FileSystem(
key = os.environ['JUPYTERLAB_S3_ACCESS_KEY_ID'],
secret = os.environ['JUPYTERLAB_S3_SECRET_ACCESS_KEY'],
anon = False,
client_kwargs = {'endpoint_url':os.environ['JUPYTERLAB_S3_ENDPOINT']},
)
s3.ls('my-bucket')
It's working but is there a simplier way to get this s3fs access ?
I don't want my users to deal with the S3 credentials if they want to access the S3 content from their notebook.
@PhE I also facing this problem. Have you solved this problem? If yes, could you please share the solution? Thanks! :)
@TerenceLiu98 I solved it with a different approach : I use rclone to mount the S3 bucket. I start a background process with the rclone command. In my case this is a kubernetes pod with 2 containers (one for rclone, the other for jupyter).
It is more stable than s3fs and the users can browse it as usual.
Found the described workaround.
@TerenceLiu98 I solved it with a different approach : I use rclone to mount the S3 bucket. I start a background process with the rclone command. In my case this is a kubernetes pod with 2 containers (one for rclone, the other for jupyter).
It is more stable than s3fs and the users can browse it as usual.
I use a similar way to solve the problem as well, instead of rclone, I use juicefs, however, my environment is only Docker so I could not bind them into one pod. However juicefs needs the privileged
option as it is FUSE, is rclone the same?
rclone does not require the privileged
option. A simple rclone mount is enough. We also use rclone sync
as a cheap local file cache.
A good point, compared to juicefs, is that the file/folder hierarchy is conserved in the S3 bucket. I mean what you see in the bucket are the real files and folders names and not some cryptic unusable chunks.
rclone does not require the
privileged
option. A simple rclone mount is enough. We also userclone sync
as a cheap local file cache. A good point, compared to juicefs, is that the file/folder hierarchy is conserved in the S3 bucket. I mean what you see in the bucket are the real files and folders names and not some cryptic unusable chunks.
I found a CSI driver - (k8s-csi-s3) using S3 as the StorageClass and using geesefs (which may have better performance than the rclone) for POSIX. It is good for the usage scenario where combining jupyterlab and S3. I have tried both of the k8s-csi-s3 and juicefs-csidriver, and both of them are working well; however, the juicefs need an extra database for the metadata storage.
The S3 content is accessible with the browser pane on the left. But the S3 content is not visible from the Python code inside the Notebook.
If I have a
notebook.ipynb
along with adata.txt
file in the same folder, the following code in the notebook will fail :I understand the S3 content can't be exposed as a filesystem to the Python kernel. But we should have a way to access the S3 content from Python.