awslabs / aws-c-s3

C99 library implementation for communicating with the S3 service, designed for maximizing throughput on high bandwidth EC2 instances.
Apache License 2.0
101 stars 41 forks source link

Handle range header client-side #370

Closed eeroel closed 10 months ago

eeroel commented 11 months ago

Describe the feature

Would it be possible to avoid the HeadObject requests when doing a GET range request? I noticed this comment but I wonder if it's something that's feasible, or in the plans? https://github.com/awslabs/aws-c-s3/blob/83008e577804643bc632ae4e603f36ab96219b9b/source/s3_auto_ranged_get.c#L166

Use Case

When reading data in Parquet format (e.g. data lake applications), the file footer needs to be read first, so an implementation that reads from S3 needs to start with a HeadObject request and thus already knows the object size. The data itself may then be read in several small range requests, so making redundant HeadObject requests for each of those adds up latency. I understand that this library is optimized for throughput, but it would be great if there was a way to have those performance benefits without introducing latency in cases where the amount of data read is small.

Proposed Solution

I'm not familiar with the internals of the auto-range request implementation, but maybe the first request could be made to the last range (at the end of the object) so that an Unsatisfiable error will be returned if the range is out of bounds?

Other Information

No response

Acknowledgements

jmklix commented 11 months ago

This is something that we would like to add support for, but this is not currently a high priority.

waahm7 commented 10 months ago

@eeroel Thank you for creating the issue. I have implemented client-side range-header handling, provided the range header includes a start-range. If the range header includes a start range, we no longer perform a HeadRequest. Does this solve your issue?

eeroel commented 10 months ago

@eeroel Thank you for creating the issue. I have implemented client-side range-header handling, provided the range header includes a start-range. If the range header includes a start range, we no longer perform a HeadRequest. Does this solve your issue?

Nice, yes it does!

waahm7 commented 10 months ago

@eeroel Thanks, this is resolved in https://github.com/awslabs/aws-c-s3/releases/tag/v0.4.8.