awslabs / mountpoint-s3

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
Apache License 2.0
4.67k stars 164 forks source link

Reduce the number of S3 API calls when the same object is requested at the same time #959

Open olileach opened 4 months ago

olileach commented 4 months ago

Tell us more about this new feature.

For HPC workloads, when a job is submitted that has tens of thousands of tasks that contain instructions for a required calculation, we often see the same object being requested by clients, running in pods or on EC2 instances, at the same time. A typical example of this is when a job is submitted to a queue based architecture, where existing pods or EC2 instances or up and running, and actively consuming messages from the queue. When this happens, multiple pods and instances can often request the same file in a S3 bucket in order to complete the calculation. The impact of this, when using the S3 mount point and the CSI driver if using Amazon EKS, is we see thousands of S3 API calls happening at the same time when the same object is being fetched. This causes the following error:

mountpoint_s3::fuse: lookup failed: inode error: error from ObjectClient: ListObjectsV2 failed: Client error: Unknown CRT error: CRT error 14342: aws-c-s3:AWS_ERROR_S3_SLOW_DOWN, Response code indicates throttling

A more efficient approach would be for the S3 mount point to know if the object is already being requested, and if so, pool subsequent requests until the object is fetched and available in the configured cache location, thus allowing subsequent client requests to fetch the file locally rather than making additional API calls to Amazon S3. This feature enhancement would dramatically cut down the number of Amazon S3 API calls for HPC workloads, improve overall job performance, and be more cost efficient.

arsh commented 3 months ago

We are looking into this internally and will post an update as soon as we have one.

vladem commented 2 months ago

We've looked into this issue and willing to state here that, when a single instance of Mountpoint process is used, number of ListObjectsV2 requests may be reduced by enabling metadata cache with --metadata-ttl flag.

As for the feature of reducing the total number of ListObjectsV2 emitted by multiple Mountpoint processes running on the same machine, we don't have it planned at this point.

olileach commented 2 months ago

@vladem - we are configuring the metadata-ttl as described here:

https://github.com/awslabs/mountpoint-s3-csi-driver/tree/main/examples/kubernetes/static_provisioning

how would the metadata-ttl help with the HPC use case described above when many processes read the same file at the same time?

dannycjones commented 2 months ago

Discussed offline: