Closed chriskuehl closed 7 years ago
On a 782 MB virtualenv (probably fairly representative of the kind of artifacts we are likely to have), here are some timings (in seconds) on a c3.8xlarge:
Python 3.6's tarfile |
GNU tar |
|
---|---|---|
without gzip | 11.06s | 4.65s |
with gzip | 122.47s | 34.04s |
Artifact size with gzip was 219 MB, without was 720 MB (these were ~the same with both tar and python).
So I'll probably stop using tarfile. May also consider not gzipping in the future, or at least providing the option to turn it off. 500 MB is 4 seconds at 1Gbps which may not be worth the extra time to compress/decompress.
For completeness, here's how I was testing: https://i.fluffy.cc/b5lHSvhSXxfBnG082w6VCmP5N2R96gwQ.html
cc @asottile
wow, I'm surprised the stdlib is that much slower than tar
. I agree that gzip probably isn't going to save us much when the destination is s3 in the same region
How much faster is it?