Closed eeroel closed 10 months ago
This is something that we would like to add support for, but this is not currently a high priority.
@eeroel Thank you for creating the issue. I have implemented client-side range-header handling, provided the range header includes a start-range. If the range header includes a start range, we no longer perform a HeadRequest. Does this solve your issue?
@eeroel Thank you for creating the issue. I have implemented client-side range-header handling, provided the range header includes a start-range. If the range header includes a start range, we no longer perform a HeadRequest. Does this solve your issue?
Nice, yes it does!
@eeroel Thanks, this is resolved in https://github.com/awslabs/aws-c-s3/releases/tag/v0.4.8.
Describe the feature
Would it be possible to avoid the HeadObject requests when doing a GET range request? I noticed this comment but I wonder if it's something that's feasible, or in the plans? https://github.com/awslabs/aws-c-s3/blob/83008e577804643bc632ae4e603f36ab96219b9b/source/s3_auto_ranged_get.c#L166
Use Case
When reading data in Parquet format (e.g. data lake applications), the file footer needs to be read first, so an implementation that reads from S3 needs to start with a HeadObject request and thus already knows the object size. The data itself may then be read in several small range requests, so making redundant HeadObject requests for each of those adds up latency. I understand that this library is optimized for throughput, but it would be great if there was a way to have those performance benefits without introducing latency in cases where the amount of data read is small.
Proposed Solution
I'm not familiar with the internals of the auto-range request implementation, but maybe the first request could be made to the last range (at the end of the object) so that an Unsatisfiable error will be returned if the range is out of bounds?
Other Information
No response
Acknowledgements