aws / aws-tools-for-powershell

The AWS Tools for PowerShell lets developers and administrators manage their AWS services from the PowerShell scripting environment.
Apache License 2.0
238 stars 79 forks source link

Read-S3Object and Copy-S3Object significantly slower than aws-cli when downloading large files. #63

Closed aiell0 closed 4 years ago

aiell0 commented 4 years ago

When using the commands Read-S3Object and Copy-S3Object in Powershell, all files take longer to download than if the aws-cli was used. While this may be acceptable for small files, it is grossly apparent when downloading large files which can make using S3 with Windows Powershell difficult.

Expected Behavior

When downloading a file from S3 that is 50GB in size, I expected it to be done in 4-5 minutes, especially if the bucket was in the same region as the server.

Current Behavior

The file took about 18 minutes to download with both the Read-S3Object and Copy-S3Object functions.

Possible Solution

There is a possible solution (along with some explanation of the original problem) in the following article. The conclusion that the article comes to is that the two aforementioned Powershell functions should use parallelization to better utilize instance resources.

Steps to Reproduce (for bugs)

Read-S3Object -BucketName $bucketName -KeyPrefix "$keyPrefix" -Folder '$folder where $keyPrefix contains a directory with large files.

Context

We work in an enterprise IT environment and a Microsoft SQL backup is being stored in S3. We would like to be able to download these backups from S3 instead of storing them directly on EBS due to cost. When creating Cloudformation stacks, part of the Userdata script involves downloading these backups so they can be restored shortly after. The download took a long time, which affected the deployment time of the stack significantly. Thus, we are now using the aws-cli inside of Powershell scripts for this purpose. It would be much better engineering if we can just use the Powershell tooling.

Your Environment

Include as many relevant details about the environment where the bug was discovered.

matteo-prosperi commented 4 years ago

Hello, I see that you are using AWS.Tools.S3 from Windows PowerShell (5.1 or earlier). We have noticed before that AWS.Tools and AWSPowerShell.NetCore have slower transfer speeds on Windows PowerShell than PowerShell Core.

Would you be able to check if the transfer speed is faster when using PowerShell Core 6+?

Thanks

BrucePay commented 4 years ago

Also, in PowerShell 7, the ForEach-Object cmdlet has been enhanced to support parallel execution with the -Parallel option. (Note - PS7 is still in Beta.) You can read more about this functionality here: PowerShell ForEach-Object Parallel Feature (Another option is to use the workflow functionality in PS5.1. See About ForEachbParallel. This works but it comes with a bunch of other overhead so I'd recommend giving PS7 a try.)

ashishdhingra commented 4 years ago

Hi @aiell0,

Please advise if you were able to check the transfer speed when using PowerShell Core 6+. If this is no longer an issue, kindly confirm if we can close this issue.

Thanks, Ashish

aiell0 commented 4 years ago

@BrucePay, and @matteo-prosperi,

Thank you for this advice! We ended up using PowerShell Core 6+ and that was much faster. This issue can be closed. Appreciate the patience and reminder of this @ashishdhingra!