Closed pditommaso closed 9 years ago
I've further optimized the remote AWS API invocations with this commit @c7abd6c. Mainly it does two things:
S3ObjectSummary
s retried by S3Iterator
in the S3Path
. In this way it is possible to save all the following requests necessary to find out the files attributes (it would be great to do the same for directories, but I've haven't found a way). S3ObjectSummaryLookup.lookup
method so that it uses the cached S3ObjectSummary
when available. If it is not available it users a listObjects
call to retrieve the object metadata independently if it is a file or a directory (before in the case of a directory two API calls were required). These changes decrease a lot the number of requests when traversing a big directory structure with many files.
By side-effect this solve also the NPE issue #28. Indeed now it returns correctly the timestamps data for a bucket object.
As sidenote, now S3ObjectSummaryLookup
contains a single method. I would suggest to move it into another class and remove S3ObjectSummaryLookup
class.
I've realised that the above patch was returning an invalid ObjectSummary
when the S3 path specify the bucket root. Actually, the bucket does not have any lastAccess/lastModified timestamp information.
I've pushed the commit @9440808 to handle it correctly.
thanks for all, this PR is closed by #47
The current implementation is very inefficient when listing the content of a bucket that contains a large number of items organised in many subdirectory.
The proposed PR address this issue using the
ObjectListing. getCommonPrefixes
method making it possible to list and navigate big buckets.Unfortunately I was unable to fix the tests in the
S3Iterator
class. Mockito library is not my piece of cake. I hope a core committer can help on making the tests pass.