diux-dev / cluster

train on AWS
75 stars 15 forks source link

improve EFS copying speed #35

Closed yaroslavvb closed 6 years ago

yaroslavvb commented 6 years ago

Right now EFS reading is 0.5x of the expected speed


Hello,

Thanks for contacting AWS support. My name is Ed and I will be working with you on this case.

I understand that you are doing a rsync to copy a 150 GB, although, the throughput is lower than your EFS max throughput. You EFS size is 5.7 TiB, as you said before, your max throughput should be around 570 MiB/s (286.8 MiB/s is your baseline throughput). As you mentioned, you do have burst credits, however, keep in mind that there is a hard limit (throughput) of 250 MB/s per instance[1] in EFS.

Looking at your Cloudwatch metrics (TotalIOBytes), I see that you were able to reach your max throughput in different periods in the last 24 hours:

TotalIOBytes: https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2:graph=~%28region~%27us-east-1~metrics~%28~%28~%27AWS*2fEFS~%27TotalIOBytes~%27FileSystemId~%27fs-de3d0697%29%29~period~300~stat~%27Sum~start~%272018-07-30T06*3a34*3a00Z~end~%272018-07-31T06*3a34*3a00Z%29

Even if your instance and EBS support 250 MiB/s, you might not be able to reach this throughput in a single client in some situations. Amazon EFS file systems are distributed across an unconstrained number of storage servers, enabling file systems to grow elastically to petabyte-scale and allowing massively parallel access from Amazon EC2 instances to your data. Amazon EFS's distributed design avoids the bottlenecks and constraints inherent to traditional file servers. This distributed data storage design means that multi-threaded applications and applications that concurrently access data from multiple Amazon EC2 instances can drive substantial levels of aggregate throughput and IOPS. In addition, Amazon EFS data is distributed across multiple Availability Zones (AZs), providing a high level of durability and availability.

While the distributed architecture of Amazon EFS enables high levels of availability, durability, and scalability, it results in a small latency overhead for each file operation. Due to this per-operation latency, overall throughput generally increases as the average I/O size increases (your average I/O size is just 17 KB), since the overhead is amortized over a larger amount of data.

The per-operation latency can result in EFS performance being different from a local file system performance. This is why performance is degraded when dealing with small r/w IOPs - it is a limitation of the EFS architecture.

Lastly, you can find more information about EFS limits[1] in the following link: [1] https://docs.aws.amazon.com/efs/latest/ug/limits.html#limits-client-specific

I hope this information is helpful, if you have any questions, please let me know.

Best regards,

Ed F. Amazon Web Services

Check out the AWS Support Knowledge Center, a knowledge base of articles and videos that answer customer questions about AWS services: https://aws.amazon.com/premiumsupport/knowledge-center/?icmpid=support_email_category