Improved listing of buckets with large number of items

pditommaso commented 9 years ago

The current implementation is very inefficient when listing the content of a bucket that contains a large number of items organised in many subdirectory.

The proposed PR address this issue using the ObjectListing. getCommonPrefixes method making it possible to list and navigate big buckets.

Unfortunately I was unable to fix the tests in the S3Iterator class. Mockito library is not my piece of cake. I hope a core committer can help on making the tests pass.

pditommaso commented 9 years ago

I've further optimized the remote AWS API invocations with this commit @c7abd6c. Mainly it does two things:

Cache the S3ObjectSummarys retried by S3Iterator in the S3Path. In this way it is possible to save all the following requests necessary to find out the files attributes (it would be great to do the same for directories, but I've haven't found a way).
Modify S3ObjectSummaryLookup.lookup method so that it uses the cached S3ObjectSummary when available. If it is not available it users a listObjects call to retrieve the object metadata independently if it is a file or a directory (before in the case of a directory two API calls were required).

These changes decrease a lot the number of requests when traversing a big directory structure with many files.

~~By side-effect this solve also the NPE issue #28. Indeed now it returns correctly the timestamps data for a bucket object.~~

~~As sidenote, now S3ObjectSummaryLookup contains a single method. I would suggest to move it into another class and remove S3ObjectSummaryLookup class.~~

pditommaso commented 9 years ago

I've realised that the above patch was returning an invalid ObjectSummary when the S3 path specify the bucket root. Actually, the bucket does not have any lastAccess/lastModified timestamp information.

I've pushed the commit @9440808 to handle it correctly.

jarnaiz commented 9 years ago

thanks for all, this PR is closed by #47

Upplication / Amazon-S3-FileSystem-NIO2

Improved listing of buckets with large number of items #34