IGS / portal_client

Python-based client for downloading data made available through portals powered by the GDC-based portal system..
MIT License
16 stars 17 forks source link

portal_client download extremely slow compared to aws s3 cp #10

Closed JoeEmendo closed 4 years ago

JoeEmendo commented 5 years ago

I'm working on AWS EC2.

If I use portal_client -m manifest.tsv the download is ~100x slower than if I use aws s3 cp. This seems to be the case regardless of destination path (s3 bucket, /ebs, or /efs).

Is the "slowness" due to md5 checks? Something else? portal_client seems like a very valuable tool, but if it's 100x slower I'm not sure it pays off.

I get the same results from eu-west-1 and us-east-1

victor73 commented 5 years ago

portal_client uses the boto3 python library for S3 file transfer, which I believe is the same library that the aws cli tool itself uses, so it's hard to say why you're seeing a slowdown. If you suspect the checksumming is the problem, can you try the same transfer but with the --disable-validation option turned on? That should skip MD5 checksumming so that you can rule that out.