danielfrg / s3contents

Jupyter Notebooks in S3 - Jupyter Contents Manager implementation
Apache License 2.0
248 stars 88 forks source link

Slowness #129

Closed arunhallan closed 3 years ago

arunhallan commented 3 years ago

Hi,

Great library - thanks.

When using Jupyter, browsing and opening s3 notebooks is quite laggy. When viewing via commuter, it's much faster.

Is there anything I can do to speed it up?

danielfrg commented 3 years ago

I would guess the cause its related to jupyter + s3contents. The package uses core aws tooling that I imagine are relatively fast but then S3 is not the fastest in some cases.

Does it happen with a large number of notebooks?

arunhallan commented 3 years ago

No - I only have 1 or 2 files in each folder at the moment.

lydian commented 2 years ago

I also find the same issue. Compare to my local testing, testing on a prefix with 100 items, straightforward boto3 list_objects took less than 1 second, but s3contents took 6 seconds, which is 600% slower!

My guess is the s3contents also trying to get the directory last_modified time, and therefore for all the directory, it need to issue another query to get it, which slow down the entire process. In most of my use cases, I don't think we really care about the last_modified time for a directory. Is it possible to make it an optional property so that we don't need to wait on it?