chriskuehl / lazy-build

Remotely cache build artifacts based on file hashes
MIT License
2 stars 1 forks source link

Figure out if it's worth dropping tarfile and shelling out to tar instead #17

Closed chriskuehl closed 7 years ago

chriskuehl commented 7 years ago

How much faster is it?

chriskuehl commented 7 years ago

On a 782 MB virtualenv (probably fairly representative of the kind of artifacts we are likely to have), here are some timings (in seconds) on a c3.8xlarge:

Python 3.6's tarfile GNU tar
without gzip 11.06s 4.65s
with gzip 122.47s 34.04s

Artifact size with gzip was 219 MB, without was 720 MB (these were ~the same with both tar and python).

So I'll probably stop using tarfile. May also consider not gzipping in the future, or at least providing the option to turn it off. 500 MB is 4 seconds at 1Gbps which may not be worth the extra time to compress/decompress.

For completeness, here's how I was testing: https://i.fluffy.cc/b5lHSvhSXxfBnG082w6VCmP5N2R96gwQ.html

cc @asottile

asottile commented 7 years ago

wow, I'm surprised the stdlib is that much slower than tar. I agree that gzip probably isn't going to save us much when the destination is s3 in the same region