Consider retrieving object summaries individually

martinklepsch commented 8 years ago

Currently we download object summaries instead of getting each objects data with an individual request.

This is faster when we need sync information for a lot of objects
If the number of objects in the bucket grows (1000s) this gets slower
If the number of objects to sync is small retrieving their data individually might be faster
Retrieving objects individually will not be enough when pruning the bucket.

Getting objects data individually as mentioned in 3 would also allow diffing and syncing of metadata.

There is no clear right way in this case. I see the following options:

Add some logic that decides which approach to use
Add an option that allow users to decide how to get bucket information

I think adding logic is intransparent and might confuse so I'm thinking the latter option is best.

@podviaznikov any opinion to offer?

/via #9

podviaznikov commented 8 years ago

I wonder would be the time difference for say 100 objects? You can send individual requests in parallel, right?

martinklepsch commented 8 years ago

Will need to check that. On Wed, 9 Dec 2015 at 03:09, Anton Podviaznikov notifications@github.com wrote:

I wonder would be the time difference for say 100 objects? You can send individual requests in parallel, right?

— Reply to this email directly or view it on GitHub https://github.com/confetti-clj/s3-deploy/issues/10#issuecomment-163084172 .

martinklepsch commented 8 years ago

400 objects w/o any parallel processing:

65s for getting data individually per object
1s for getting object summaries

martinklepsch commented 8 years ago

I did some very basic work towards this in the individual-diff branch. I'll release a 0.1.0 without it now and then we can cut a release with this later when it's done.

confetti-clj / s3-deploy

Consider retrieving object summaries individually #10