HDFGroup / nasa_cloud

Apache License 2.0
3 stars 2 forks source link

Improve performance of rangeget requests #3

Closed jreadey closed 1 month ago

jreadey commented 1 year ago

Rangeget proxy seems not to make much difference in the benchmark even though it was designed with datasets such as IceSat-2 in mind. Investigate ways to improve performance.

jreadey commented 1 year ago

One idea is to have the DN make two async requests on a rangeget: One directly to S3 for the exact number of bytes needed and one to the rg proxy which will read pagesize bytes from S3 (if the data is not already present). If the DN->S3 requests returns first, use that and cancel the rg requests. If the rg request returns first (likely because the data was found in the cache, cancel the DN->S3 request. Idea is not to have the DN wait on the rg proxy reading more than is needed bytes from S3.

jreadey commented 1 month ago

The Rangeget Proxy has been removed, and performance improvements for rangegets has been directed towards Intelligent Range Gets (https://github.com/HDFGroup/nasa_cloud/milestone/1), and hyper chunking.