Closed jagane-opensource closed 4 years ago
This patch may be applied using: patch -p1 < patch-issue-13.txt patch-issue-13.txt
Here's a slightly improved version of the patch - more concise code, better named functions, etc. patch-issue-13-v2.txt
Hi Jagane, thanks for the patch, this is great! The performance is indeed dramatically better with your changes. However, I've noticed that it doesn't properly fetch objects that are nested a few "folders" deep (i.e. objects with prefixes like path/to/my/object
). I should have time to look into it further this Friday.
Would you like to open up a PR so you're properly recorded as the contributor for this improvement? If not, I can make the changes myself and list you as a contributor in the readme.
You are right, James - there was a bug in my patch that caused hierarchical listings to not work correctly. I fixed that now.
I actually created a PR on github - PR#13 and I have the latest version of the patch uploaded there. I also renamed a misnamed method and made the code more concise.
Jagane
On Wed, Apr 8, 2020 at 7:42 PM James Reeve notifications@github.com wrote:
Hi Jagane, thanks for the patch, this is great! The performance is indeed dramatically better with your changes. However, I've noticed that it doesn't properly fetch objects that are nested a few "folders" deep (i.e. objects with prefixes like path/to/my/object). I should have time to look into it further this Friday.
Would you like to open up a PR so you're properly recorded as the contributor for this improvement? If not, I can make the changes myself and list you as a contributor in the readme.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/IBM/jupyterlab-s3-browser/issues/13#issuecomment-611296442, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOUO6DDW2Y6CZR7BNE7XM5LRLUYZBANCNFSM4MDQZAWQ .
Hi Jagane,
Thanks for making the fix, but I don't see your PR. Can you try creating it again?
Hello James
When you say PR, do you mean the 'Issue' feature in github, or do you folks have a different bug tracking system? I apologize if this is described somewhere else, I must have missed it.
Thanks Jagane
On Thu, Apr 9, 2020 at 5:44 AM James Reeve notifications@github.com wrote:
Hi Jagane,
Thanks for making the fix, but I don't see your PR. Can you try creating it again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/IBM/jupyterlab-s3-browser/issues/13#issuecomment-611507451, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOUO6DAOYRDMAHHCJO6T3QDRLW7MNANCNFSM4MDQZAWQ .
Hi Jagane,
In GitHub jargon a PR or Pull Request is a request to merge code from one branch into another. So on your fork of this repository, you can click Pull Request
(or just go directly to this link: https://github.com/jagane-opensource/jupyterlab-s3-browser/pull/new/master)
On the next page you can see a diff of all your changes and you can click the Create pull request
to make it visible to me.
Aha - I was thinking that PR means Problem Report, an archaic term for bug reports. I just created the github Pull Request, you should see it soon.
Best Jagane
On Thu, Apr 9, 2020 at 10:17 AM James Reeve notifications@github.com wrote:
Hi Jagane,
In GitHub jargon a PR or Pull Request is a request to merge code from one branch into another. So on your fork of this repository https://github.com/jagane-opensource/jupyterlab-s3-browser, you can click Pull Request (or just go directly to this link: https://github.com/jagane-opensource/jupyterlab-s3-browser/pull/new/master ) [image: open-a-pr] https://user-images.githubusercontent.com/12180828/78921993-c98c7d80-7a63-11ea-8984-f8de1e5409c3.png
On the next page you can see a diff of all your changes and you can click the Create pull request to make it visible to me.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/IBM/jupyterlab-s3-browser/issues/13#issuecomment-611646591, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOUO6DEP3KBRBN3VMRUA4A3RLX7KHANCNFSM4MDQZAWQ .
Thanks for opening the PR, let's continue discussion in the PR itself: https://github.com/IBM/jupyterlab-s3-browser/pull/14#pullrequestreview-391023593
Version 0.6.0 of this extension is now available with the fix for this issue from your PR. Thanks again for your contribution.
Reading the s3 bucket using the following snippet all_objects = [obj for obj in bucket.objects.filter(Prefix=path)]
The bucket.objects.filter(Prefix=path)] takes a long time for buckets that have millions of objects. This is because we don't process it hierarchically using a separator. Switching to the list_objects_v2 api makes things much faster. However, this implies certain things - for example, we must necessarily browse directories by sending a path with a trailing "/" and get objects using a path that does not include a trailing "/". Additionally, it feels like the browser components are the correct place to store the information as to whether a path is a directory or a file. In fact, we send this information to jupyterlab when we provide it with a listing. The patch included here does the following: