Closed tomc603 closed 9 years ago
Sorry for the long delay. Work got in the way of writing a comparison between Go and Python. I hate two test scenarios- The first is a 1GB file of data from /dev/zero, bzip compressed. The second is a 1GB file of data from /dev/urandom also bzip compressed. The first should be a best case performance since all of the data is RLE encoded and the compressed file is a few hundred bytes. The second case should be a worst-case scenario where the data is not generally compressible and the compressed file is larger than the source. Results: Decompressing /home/tcameron/tmp/decompress/zeros.data.bz2 Go 1.1 Decompress time: 3.212 sec Py 2.7 Decompress time: 3.070 sec Decompressing /home/tcameron/tmp/decompress/random.data.bz2 Go 1.1 Decompress time: 528.765 sec Py 2.7 Decompress time: 104.724 sec Let's call the zeros.dat.bz2 test even. Milliseconds for this file do not really interest me. It is worth noting that Python's version is faster...but by less than a quarter of a second. This could be down to lots of things and I'm not necessarily interested in tracking them down. The random.dat.bz2 test is much more enlightening. Slower by a factor of >5 is surprising to me, and it equates to roughly 1.9MB/sec. I understand there hasn't been much effort to optimize the bzip library for speed, so I figured my real-world experience could be used to help the project in some way. My actual use case of this is a syslog file parser, which I've been writing to replace a Python script I previously wrote and to drive the lessons of Go into my brain. I see very similar results with text file processing, but since I can not offer the text files themselves for others to test with, I've tried something a bit more reproducible. These tests are being performed on a Lenovo T430 with an SSD, Intel Core i5-3320M CPU @ 2.60GHz, and 8GB RAM while plugged into an AC power source. The Operating System is Ubuntu 13.10 with Kernel 3.11.0-13-generic, x86_64 architecture. To review the source of each test application, please review my Github repos: https://github.com/tomc603/pycompresstest https://github.com/tomc603/gocompresstest
After running the same tests with Go 1.2rc5 a couple times just to confirm I'm not crazy (still a possibility though), it seems data that is RLE is actually twice as slow as Go 1.1. For these particular tests, I'm not seeing a 30% increase in speed, though I'm exercising the two most extreme cases. Results: Decompressing /home/tcameron/tmp/decompress/zeros.data.bz2 Go 1.1 Decompress time: 3.000 sec Go 1.2rc5 Decompress time: 6.612 sec Decompressing /home/tcameron/tmp/decompress/random.data.bz2 Go 1.1 Decompress time: 534.020 sec Go 1.2rc5 Decompress time: 499.078 sec
CL https://golang.org/cl/131840043 mentions this issue.
CL https://golang.org/cl/131470043 mentions this issue.
CL https://golang.org/cl/13852 mentions this issue.
CL https://golang.org/cl/13853 mentions this issue.
Attachments: