Closed gopherbot closed 9 years ago
I looked a little deeper into this examining two files made by compressing the King James Bible using native Debian Linux gzip and compress/gzip. The native gripped file contains 23 blocks encoded using dynamic Huffman trees The compress/gzipped file contains 47 blocks encoded using dynamic Huffman trees and an empty block at the end containing no data at all. This seems to indicate that compress/gzip (actually compress/flate) decides to emit a new block much more frequently than native gzip. At the same time the emitting of blocks is partly controlled by the maxFlateBlockTokens in deflate.go. Changing this reduces the output size (but not by enough to reach native gzip levels).
Experimenting further with maxFlateBlockTokens it's definitely possible to reduce the number of blocks emitted. With maxFlateBlockTokens changed from 1 << 14 to 1 << 16 the same King James Bible compression results in 12 blocks. With it equal to 1 << 15 a total of 24 blocks are emitted. However, the file size has not changed greatly. Standard compress/gzip = 1507879 With 1<<15 =1506733 With 1<<16 =1506281 So, this must go deeper into the actual compression itself.
Reproduced. $ hg identify 0b98ba2443b0 tip I have used the King James Bible from $ wget http://www.gutenberg.org/cache/epub/10/pg10.txt: 4452069 pg10.txt I used the following Go program: $ cat gogzip.go package main import ( "compress/gzip" "flag" "io" "os" ) var n = flag.Int("n", 6, "Compression level (0-9)") func main() { flag.Parse() c, _ := gzip.NewWriterLevel(os.Stdout, *n) io.Copy(c, os.Stdin) } $ cat pg10.txt | gzip -6 > pg10.linux.gz $ cat pg10.txt | ./gogzip -n=6 > pg10.go.gz 1486154 pg10.go.gz 1404431 pg10.linux.gz What I don't like is that the source is not exactly the same as topic starter has, but anyway: the issue exists. I will try to find out the source of the problem.
My program above has an error (missing c.Close(), so bytes may stuck in the buffer w/o being flushed to stdout, which is reproduced on zero-array input). The correct program is: package main import ( "compress/gzip" "flag" "io" "os" ) var n = flag.Int("n", 6, "Compression level (0-9)") func main() { flag.Parse() c, _ := gzip.NewWriterLevel(os.Stdout, *n) io.Copy(c, os.Stdin) c.Close() } Test results: 1048576 zero.1M.bin 1051 zero.1M.bin.linux.gz 1055 zero.1M.bin.go.gz 4452069 pg10.txt 1404431 pg10.linux.gz 1507801 pg10.go.gz
The preview of the fix for lazy matching is here: http://golang.org/cl/5554066 Compression results: level=6: 4452069 pg10.txt 1404431 pg10.linux.gz 1472391 pg10.go.gz level=9: 4452069 pg10.txt 1392385 pg10.linux.gz 1467823 pg10.go.gz The gap is still there, but it's getting better.
Not yet, because I see the cases when Go produce slightly shorter results as well: head -n 1000 pg10.txt > short.txt 42728 short.txt 13658 short.go.gz 13664 short.linux.gz I would like to polish the fix above (via adding more DeflateInflate tests to ensure correctness) and after that understand this difference.
I see some room for improvement at cost of some memory. Now, the matching algorithm uses 32k int array to store hashes and hash collisions prevent finding a match. Stupid increasing the hashHead size (and changing several related constants accordingly), gives the following result: -6: 1463307 (vs 1472391 above) -9: 1459164 (vs 1467823 above) This is not immediately applicable because for some reason it slows down the encoder. Anyway, it's not a big deal to fix. Moreover, there is a good chance that using more memory would speed up the encoder, because hash function would be simplified. Another room for improvement is to allow to decrease the length of previous match by 1, if its length > 3. It may have sense in case of the following text: zbcazzzzzzzzzzbc now, if I am correct, the algorithm would take something like: [z][b][c][a][z][z][z](zzz)(zzzz)[b][c] // 11 tokens it may be improved as [z][b][c][a][z][z][z](zzz)(zzz)(zbc) // 10 tokens I will probably do something about this, once http://golang.org/cl/5554066 is ready and committed.
I have applied your patch to my local copy of Go and rerun my web site tests. BBC News Linux gzip size = 21095, original Go gzip size = 21612, patched Go gzip size = 21307 CNN Linux gzip size = 19971, original Go gzip size = 20473, patched Go gzip size = 20073 Wikipedia Linux gzip size = 15149, original Go gzip size = 15551, patched Go gzip size = 15172 King James Bible Linux gzip size = 1404452, original Go gzip size = 1503220, patched Go gzip size = 1472447 In all cases the new gzip size is smaller. Currently the sizes as a percentage of the Linux gzip size are as follows: BBC 101.00% CNN 100.51% Wikipedia 100.15% King James Bible 104.84% Previously those figures were: BBC 102.45% CNN 102.51% Wikipedia 102.65% King James Bible 107.03% So a marked improvement albeit with still room for more.
I have started seriously considering to change the last name from Krasin to Krasin-Idiot. See http://golang.org/cl/5556077/ King James Bible, -9: 1392218 pg10.go.gz 1392385 pg10.linux.gz King James Bible, -6: 1403709 pg10.go.gz 1404431 pg10.linux.gz It means, that Go implementation now outperforms zlib on this particular example. Basically, this single "-" cleared the history after each fillDeflate which led to shorter dictionary.
I've made that small change as well and rerun the numbers: BBC News Linux gzip size = 21095, original Go gzip size = 21612, patched Go gzip size = 21088 CNN Linux gzip size = 19971, original Go gzip size = 20473, patched Go gzip size = 19894 Wikipedia Linux gzip size = 15149, original Go gzip size = 15551, patched Go gzip size = 15172 King James Bible Linux gzip size = 1404452, original Go gzip size = 1503220, patched Go gzip size = 1403723 In three of the four cases the Google Go gzip size is now smaller than the Linux gzip size. The absolute differences are BBC 7 bytes, CNN 77 bytes, Wikipedia -23 bytes, Bible 729 bytes. Given that for the the Linux gzip files the gzip header has the filename in it this is a very, very satisfactory result. Marvellous! For completeness here are the percentages BBC 99.97% CNN 99.61% Wikipedia 100.15% King James Bible 99.95% Given how close this is (and often better), I'm satisfied. Thanks for working on this.
I have just checked that aforementioned "room for improvement at cost of memory" is not available now -- the compressed size does not become smaller if hashHead is enlarged. At the same time, I see performance issues: King James Bible, -9: time -p ./gogzip -n 9 > pg10.go.gz real 0.88 user 0.87 sys 0.00 time gzip -9 < pg10.txt > pg10.linux.gz real 0m0.457s user 0m0.448s sys 0m0.008s King James Bible, -6: time -p ./gogzip -n 6 > pg10.go.gz real 0.56 user 0.54 sys 0.01 time gzip -6 < pg10.txt > pg10.linux.gz real 0m0.231s user 0m0.220s sys 0m0.008s I will try to understand the reason of such a gap (it may be due to the fact that Go compiler has less optimizations that gcc or it may be the algorithmic slowness)
If you haven't used gopprof before, see http://blog.golang.org/2011/06/profiling-go-programs.html (Only works on Linux, not on Mac.)
Thanks, Russ. I'm already there: Total: 12110 samples 3529 29.1% 29.1% 9979 82.4% compress/flate.(*compressor).deflate 1686 13.9% 43.1% 1686 13.9% compress/flate.(*compressor).findMatch 1222 10.1% 53.2% 1236 10.2% compress/flate.(*compressor).fillDeflate 497 4.1% 57.3% 4827 39.9% compress/flate.(*huffmanBitWriter).writeBlock 460 3.8% 61.1% 460 3.8% hash/crc32.update 371 3.1% 64.1% 2146 17.7% runtime.mallocgc 335 2.8% 66.9% 2500 20.6% compress/flate.(*huffmanEncoder).bitCounts 332 2.7% 69.6% 332 2.7% compress/flate._func_001 321 2.7% 72.3% 953 7.9% compress/flate.(*literalNodeSorter).Less 286 2.4% 74.6% 741 6.1% sweep
The first trivial improvement here: http://golang.org/cl/5555070/ Instead of filling hashPrev and hashHead with zeros at every fillDeflate, it uses hashOffset that is adjusted at fill. Impact (I test on 1.5 GB avi file), -6: $ time gzip < /tmp/lala.avi > /dev/null real 0m46.657s user 0m46.415s sys 0m0.200s Original Go (b372a927701e tip) $ 6g gogzip.go && 6l -o gogzip gogzip.6 && time -p ./gogzip < /tmp/lala.avi > /dev/null real 120.20 user 119.63 sys 0.45 Patched Go: $ 6g gogzip.go && 6l -o gogzip gogzip.6 && time -p ./gogzip < /tmp/lala.avi > /dev/null real 108.85 user 108.35 sys 0.40 In short: gzip: 46.657 original Go: 120.20 (2.57x gzip) patched Go: 108.85 (2.33x gzip, 10% improvement)
I have another cleanup CL here: http://golang.org/cl/5561056/ When these two CLs are committed, I am going to optimize findMatch. Currently, it's always called with prevLength = 2. It should be called with larger prevLength if fastSkipHashing == skipNever and there was a match found at previous step.
Three CLs committed: http://code.google.com/p/go/source/detail?r=26dceba5c6100300d585bd5a87d479951f2bd498 http://code.google.com/p/go/source/detail?r=c2b80f5d73a6134afbd8110f8ab665a100dc7c95 http://code.google.com/p/go/source/detail?r=7a7845c5895511438239559968007c5c13c0a6a2 Current stats (on the same .avi file): gzip: 46 s Go: 107 s (2.33x) I will continue looking for options to speed up this.
I have decided to increase the size of hashHead from 32768 to 131072. http://golang.org/cl/5569048/ Results: amd64, core i7 2600, 1.5 GB avi: gzip: 46 s Go: 107 s (2.33x) patched Go: 80.33 s (1.75x) ARM, Cortex A9 (first 50 MB of the avi): gzip: 12.53 s Go: 73 s (5.82x) patched Go: 65.02 s (5.2x) This change does not come free. It may affect systems with small L2 cache (like some Atom processors) or older ARM systems (Cortex A8 and older). At the same time, I hope that the slowdown should not be too bad, because of heavily reduced hash collision rate.
I have decided to include a text file to the benchmark. guterberg.txt is a subset of Gutenberg Project texts created with: $ find /disk2/gutenberg/ -name "[a-h]*.txt" | xargs cat > gutenberg.txt amd64, core i7 2600, 435MB gutenberg.txt: gzip: 26 s Go: 57.88 (2.23x) patched Go: 57.62 s (2.22x) ARM, Cortex A9, first 50 MB of guternberg.txt: gzip: 30.79 s Go: 84.33 s (2.74x) patched Go: 84.02 s (2.73x) Text version is mostly not affected by this change, which suggests us that hash function is quite good, because it worked well even on small table size.
What I like about this change (http://golang.org/cl/5569048/) is that it changes the profile in a very clear way: fixes non-productive findMatch calls in case if the data is already compressed (like avi). Before change: avi: Total: 10769 samples 3276 30.4% 30.4% 9896 91.9% compress/flate.(*compressor).deflate 1866 17.3% 47.7% 1866 17.3% compress/flate.(*compressor).findMatch 473 4.4% 52.1% 4818 44.7% compress/flate.(*huffmanBitWriter).writeBlock 425 3.9% 56.1% 2256 20.9% runtime.mallocgc 420 3.9% 60.0% 420 3.9% hash/crc32.update 331 3.1% 63.1% 2590 24.1% compress/flate.(*huffmanEncoder).bitCounts 328 3.0% 66.1% 328 3.0% compress/flate._func_001 304 2.8% 68.9% 803 7.5% sweep 300 2.8% 71.7% 966 9.0% compress/flate.(*literalNodeSorter).Less 238 2.2% 73.9% 793 7.4% compress/flate.literalNodeSorter.Less text: Total: 5969 samples 4594 77.0% 77.0% 4594 77.0% compress/flate.(*compressor).findMatch 643 10.8% 87.7% 5781 96.9% compress/flate.(*compressor).deflate 207 3.5% 91.2% 563 9.4% compress/flate.(*huffmanBitWriter).writeBlock 127 2.1% 93.3% 127 2.1% hash/crc32.update 95 1.6% 94.9% 167 2.8% compress/flate.(*huffmanBitWriter).writeBits 68 1.1% 96.1% 157 2.6% compress/flate.(*huffmanBitWriter).writeCode 40 0.7% 96.7% 77 1.3% compress/flate.(*huffmanBitWriter).flushBits 26 0.4% 97.2% 26 0.4% compress/flate.offsetCode 25 0.4% 97.6% 29 0.5% syscall.Syscall 12 0.2% 97.8% 80 1.3% runtime.mallocgc After change: avi: Total: 8885 samples 2721 30.6% 30.6% 8061 90.7% compress/flate.(*compressor).deflate 652 7.3% 38.0% 652 7.3% compress/flate.(*compressor).findMatch 506 5.7% 43.7% 4719 53.1% compress/flate.(*huffmanBitWriter).writeBlock 429 4.8% 48.5% 2210 24.9% runtime.mallocgc 414 4.7% 53.1% 414 4.7% hash/crc32.update 324 3.6% 56.8% 324 3.6% compress/flate._func_001 309 3.5% 60.3% 806 9.1% sweep 307 3.5% 63.7% 2489 28.0% compress/flate.(*huffmanEncoder).bitCounts 303 3.4% 67.1% 934 10.5% compress/flate.(*literalNodeSorter).Less 225 2.5% 69.7% 727 8.2% runtime.MCache_Alloc text: Total: 5875 samples 4594 78.2% 78.2% 4594 78.2% compress/flate.(*compressor).findMatch 585 10.0% 88.2% 5697 97.0% compress/flate.(*compressor).deflate 198 3.4% 91.5% 541 9.2% compress/flate.(*huffmanBitWriter).writeBlock 107 1.8% 93.3% 107 1.8% hash/crc32.update 87 1.5% 94.8% 153 2.6% compress/flate.(*huffmanBitWriter).writeBits 58 1.0% 95.8% 148 2.5% compress/flate.(*huffmanBitWriter).writeCode 32 0.5% 96.4% 71 1.2% compress/flate.(*huffmanBitWriter).flushBits 28 0.5% 96.8% 35 0.6% syscall.Syscall 22 0.4% 97.2% 22 0.4% compress/flate.offsetCode 21 0.4% 97.6% 91 1.5% compress/flate.(*huffmanEncoder).bitCounts (I'm sorry for a lot of small updates on this issue. Feel free to unsubscribe. I would like to continue posting my progress here)
http://golang.org/cl/5569048/ is committed. Currently, I have evaluated a number of possible optimizations of findMatch and all of them are effectively defeated by bounds-checking. I'm currently thinking of two possible ways to go: 1. Use unsafe.Pointer and uintptr within findMatch. It would make the code less readable 2. Put findMatch into .c file I dislike both options. Russ, what is preferrable? Am I missing something?
Could you post results with bounds-checking disabled? That would give us an idea how much it is currently costing us. My preference (which doesn't really matter) would be to leave it in "pure'n safe" go, and focus on improvements to the compiler, e.g. if there are ways that it could determine that bounds-checking is unneeded...
I take my words back. This is not bounds-checking, at least, on the compressable data. I have decided to see the distribution of actual tries (see findMatch). gutenberg.txt: 0: 0 1: 7030274 2: 6152709 3: 5545541 4: 5087985 5: 4695312 6: 4343402 7: 4047307 8: 3772723 9: 3523595 10: 3302578 11: 3104393 12: 2923958 13: 2760165 14: 2609980 15: 2470225 16: 2344119 17: 2221547 18: 2114974 19: 2013985 20: 1925451 21: 1840720 22: 1767779 23: 1701087 24: 1640795 25: 1583392 26: 1529609 27: 1483136 28: 1436692 29: 1395554 30: 1349483 31: 1304079 32: 1258346 33: 1213482 34: 1168298 35: 1121334 36: 1076541 37: 1032775 38: 991983 39: 953072 40: 916813 41: 882443 42: 850751 43: 818134 44: 788817 45: 759793 46: 734664 47: 707436 48: 684333 49: 661871 50: 642868 51: 625065 52: 606764 53: 587920 54: 572157 55: 554170 56: 540519 57: 526047 58: 513336 59: 499707 60: 487171 61: 477184 62: 468153 63: 457002 64: 446098 65: 436812 66: 426783 67: 418369 68: 410367 69: 401949 70: 394634 71: 386615 72: 376867 73: 370831 74: 361638 75: 356390 76: 349221 77: 341120 78: 335207 79: 327415 80: 319881 81: 314124 82: 309320 83: 302548 84: 297445 85: 290371 86: 282705 87: 278527 88: 273746 89: 267290 90: 263012 91: 257661 92: 252553 93: 247228 94: 242580 95: 238915 96: 234916 97: 229083 98: 224706 99: 221516 100: 217734 101: 213113 102: 209530 103: 206625 104: 202630 105: 198859 106: 195559 107: 191800 108: 188813 109: 185493 110: 182859 111: 179525 112: 176293 113: 174206 114: 170770 115: 168445 116: 164788 117: 162212 118: 160223 119: 157799 120: 155457 121: 152843 122: 151004 123: 148517 124: 146771 125: 144575 126: 142698 127: 139581 128: 14601181 avi file: ActualTriesMap: 0: 0 1: 304332953 2: 38984672 3: 3653538 4: 408016 5: 118865 6: 70082 7: 50490 8: 39877 9: 32578 10: 28537 11: 26114 12: 24136 13: 22615 14: 20953 15: 19872 16: 18952 17: 18199 18: 17621 19: 17402 20: 17296 21: 17029 22: 17012 23: 16717 24: 16608 25: 16154 26: 15905 27: 15975 28: 15847 29: 15544 30: 15608 31: 15535 32: 15200 33: 14892 34: 14562 35: 14527 36: 14196 37: 13670 38: 13399 39: 12826 40: 12277 41: 11834 42: 11352 43: 10750 44: 10180 45: 9630 46: 8940 47: 8300 48: 7865 49: 7216 50: 6734 51: 6199 52: 5806 53: 5296 54: 4837 55: 4452 56: 4063 57: 3651 58: 3215 59: 3011 60: 2679 61: 2420 62: 2407 63: 2221 64: 1976 65: 1703 66: 1592 67: 1418 68: 1218 69: 1117 70: 1039 71: 969 72: 877 73: 801 74: 746 75: 652 76: 573 77: 548 78: 481 79: 432 80: 396 81: 358 82: 330 83: 286 84: 265 85: 263 86: 214 87: 200 88: 161 89: 145 90: 130 91: 98 92: 93 93: 91 94: 79 95: 67 96: 57 97: 55 98: 44 99: 31 100: 27 101: 28 102: 22 103: 27 104: 21 105: 24 106: 16 107: 19 108: 16 109: 17 110: 16 111: 11 112: 12 113: 14 114: 11 115: 6 116: 5 117: 6 118: 7 119: 8 120: 10 121: 9 122: 5 123: 7 124: 10 125: 7 126: 13 127: 14 128: 6674 In case of text, it's not hash collisions, it's the ground truth: there're very popular character-level trigrams (which may be different, depends on the actual data). I will think a little bit how to improve the data structure, because the last element in the text case suggests that we're doing a lot of worthless iterations.
yes, it's algorithmic issues with http://en.wikipedia.org/wiki/Trigram We spend too much time to "the", "and" and related stuff, because the lists are really huge for them. I am currently trying to produce a solution for hot trigrams. If my idea would not work, I will try to look into gzip or kindle ask Jean-Loup Gailly to help.
Two updates: 1. Minor cleanup CL that does not affect performance (removing dead code): http://golang.org/cl/5577061/ 2. The second CL dramatically reduce the amount of calls to runtime.GC from huffmanEncoder: http://golang.org/cl/5573076/ (not completely ready to submit) The second CL depends on the first (unintentionally). Stats are given for the maximum compression level (-9): gutenberg: gzip: 37.52 Go: 76 (2.02x) patched Go: 74.43 (1.98x, 2% improvement -- it's because most of the time it's doing findMatch) avi: gzip: 46.21 Go: 90 (1.95x) patched Go: 67.79 (1.47x, 25% improvement) Obviously, GC issues only affect large files, but anyway, it's probably the right step. I will provide stats for other compression levels tomorrow.
Let's postpone any algorithmic changes to gzip until after Go 1. I'd like to let things settle so that we have some confidence in the stability of the code we're going to release. It's getting to be too late to be making significant changes, which could be harboring new bugs to find and fix (or miss).
Russ, is http://golang.org/cl/5573076/ in the category of algorithmic changes?
CL 5573076 was superseded by https://golang.org/cl/10006043 (revision 2fe813f4f3c2).
by jgrahamc: