compress/flate: make compressor faster

gopherbot commented 12 years ago

by jgrahamc:

I have tried a number of different documents by compressing them with the Linux gzip
command and using the compress/gzip package.  This occurs even if the same compression
level is used in both programs.  

The following code is used for gzipping using Go:

package main

import (
       "compress/gzip"
       "io"
       "os"
)

func main() {
       c, _ := gzip.NewWriter(os.Stdout)

       io.Copy(c, os.Stdin)
}

Some representative examples of HTML of home pages of various web sites and the King
James Bible from Project Gutenberg.

BBC News original size = 98830, Linux gzip size = 21095, Go gzip size 
= 21612 
CNN  original size = 92084, Linux gzip size = 19971, Go gzip size = 
20473 
Wikipedia  original size = 55846, Linux gzip size = 15149, Go gzip 
size = 15551 
King James Bible: original size = 4452097, Linux gzip 
size = 1404452, Go gzip size = 1503220 

Another user reported similar results on Mac OS X using the native gzip there
compressing /usr/share/dict/words:

the default compression level for my gzip (osx) is 6. curiously the
compressed /usr/share/dict/words for that level is 60 bytes less than
the one compressed with the best compression level, 9. the difference
between 6 and 9 in Go's gzip/flate is ~320 bytes.

768586  --> go gzip, 9 (BestCompression)
768266 --> go gzip, 6 (default)

754615 --> osx gzip, 9 (best)
754546 --> osx gzip, 6 (default)

Which compiler are you using (5g, 6g, 8g, gccgo)?

6g
6g version release.r60.3 9516
Debian Linux

The native gzip is producing files of somewhere between 93% and 98% of the Google Go
gzip.  This probably indicates a problem in the DEFLATE implementation
(compress/deflate) as the gzip part is just the header.  Thus is will also affect
compress/zlib.  Thus three of the four implemented compression algorithms are probably
worse than expected.

This is a serious issue for the project I am working on where compression is an integral
part.

rsc commented 12 years ago

Comment 1:

Labels changed: added priority-later, removed priority-triage.

Status changed to HelpWanted.

gopherbot commented 12 years ago

Comment 2 by jgrahamc:

I looked a little deeper into this examining two files made by compressing the King
James Bible using native Debian Linux gzip and compress/gzip.
The native gripped file contains 23 blocks encoded using dynamic Huffman trees
The compress/gzipped file contains 47 blocks encoded using dynamic Huffman trees and an
empty block at the end containing no data at all.
This seems to indicate that compress/gzip (actually compress/flate) decides to emit a
new block much more frequently than native gzip.  At the same time the emitting of
blocks is partly controlled by the maxFlateBlockTokens in deflate.go.  Changing this
reduces the output size (but not by enough to reach native gzip levels).

gopherbot commented 12 years ago

Comment 3 by jgrahamc:

Experimenting further with maxFlateBlockTokens it's definitely possible to reduce the
number of blocks emitted.
With maxFlateBlockTokens changed from 1 << 14 to 1 << 16 the same King James
Bible compression results in 12 blocks.  With it equal to 1 << 15 a total of 24
blocks are emitted.
However, the file size has not changed greatly.
Standard compress/gzip = 1507879
With 1<<15 =1506733 
With 1<<16 =1506281
So, this must go deeper into the actual compression itself.

krasin commented 12 years ago

Comment 4:

Reproduced.
$ hg identify
0b98ba2443b0 tip
I have used the King James Bible from
$ wget http://www.gutenberg.org/cache/epub/10/pg10.txt:
4452069 pg10.txt
I used the following Go program:
$ cat gogzip.go
package main
import (
    "compress/gzip"
    "flag"
    "io"
    "os"
)
var n = flag.Int("n", 6, "Compression level (0-9)")
func main() {
    flag.Parse()
    c, _ := gzip.NewWriterLevel(os.Stdout, *n)
    io.Copy(c, os.Stdin)
}
$ cat pg10.txt | gzip -6 > pg10.linux.gz
$ cat pg10.txt | ./gogzip -n=6 > pg10.go.gz
1486154 pg10.go.gz
1404431 pg10.linux.gz
What I don't like is that the source is not exactly the same as topic starter has, but
anyway: the issue exists.
I will try to find out the source of the problem.

krasin commented 12 years ago

Comment 5:

My program above has an error (missing c.Close(), so bytes may stuck in the buffer w/o
being flushed to stdout, which is reproduced on zero-array input). The correct program
is:
package main
import (
        "compress/gzip"
        "flag"
        "io"
        "os"
)
var n = flag.Int("n", 6, "Compression level (0-9)")
func main() {
        flag.Parse()
        c, _ := gzip.NewWriterLevel(os.Stdout, *n)
        io.Copy(c, os.Stdin)
        c.Close()
}
Test results:
1048576 zero.1M.bin
1051    zero.1M.bin.linux.gz
1055    zero.1M.bin.go.gz
4452069 pg10.txt
1404431 pg10.linux.gz
1507801 pg10.go.gz

krasin commented 12 years ago

Comment 6:

It's actually a serious issue with fastSkipHashing. Lazy matching is broken or even
never worked.
More details tomorrow.

krasin commented 12 years ago

Comment 7:

The preview of the fix for lazy matching is here: http://golang.org/cl/5554066
Compression results:
level=6:
4452069 pg10.txt
1404431 pg10.linux.gz
1472391 pg10.go.gz
level=9:
4452069 pg10.txt
1392385 pg10.linux.gz
1467823 pg10.go.gz
The gap is still there, but it's getting better.

gopherbot commented 12 years ago

Comment 8 by jgrahamc:

Certainly getting better.  That's great.  Do you understand what's causing the
difference in sizes?

krasin commented 12 years ago

Comment 9:

Not yet, because I see the cases when Go produce slightly shorter results as well:
head -n 1000 pg10.txt > short.txt
42728 short.txt
13658 short.go.gz
13664 short.linux.gz
I would like to polish the fix above (via adding more DeflateInflate tests to ensure
correctness) and after that understand this difference.

rsc commented 12 years ago

Comment 10:

Sounds good, thanks.

krasin commented 12 years ago

Comment 11:

The initial analysis shows that Go deflate implementation slightly degrades in the end
of dynamic Huffman block. It may be an issue with findMatch, but it's more likely to be
the encoder algorithm deficiency.

krasin commented 12 years ago

Comment 12:

I see some room for improvement at cost of some memory. Now, the matching algorithm uses
32k int array to store hashes and hash collisions prevent finding a match. Stupid
increasing the hashHead size (and changing several related constants accordingly), gives
the following result:
-6: 1463307 (vs 1472391 above)
-9: 1459164 (vs 1467823 above)
This is not immediately applicable because for some reason it slows down the encoder.
Anyway, it's not a big deal to fix. Moreover, there is a good chance that using more
memory would speed up the encoder, because hash function would be simplified.
Another room for improvement is to allow to decrease the length of previous match by 1,
if its length > 3. It may have sense in case of the following text:
zbcazzzzzzzzzzbc
now, if I am correct, the algorithm would take something like:
[z][b][c][a][z][z][z](zzz)(zzzz)[b][c] // 11 tokens
it may be improved as
[z][b][c][a][z][z][z](zzz)(zzz)(zbc) // 10 tokens
I will probably do something about this, once http://golang.org/cl/5554066 is
ready and committed.

gopherbot commented 12 years ago

Comment 13 by jgrahamc:

I have applied your patch to my local copy of Go and rerun my web site tests.
BBC News Linux gzip size = 21095, original Go gzip size = 21612, patched Go gzip size =
21307
CNN Linux gzip size = 19971, original Go gzip size = 20473, patched Go gzip size = 20073
Wikipedia Linux gzip size = 15149, original Go gzip size = 15551, patched Go gzip size =
15172
King James Bible Linux gzip size = 1404452, original Go gzip size = 1503220, patched Go
gzip size = 1472447
In all cases the new gzip size is smaller.  Currently the sizes as a percentage of the
Linux gzip size are as follows:
BBC 101.00%
CNN 100.51%
Wikipedia 100.15%
King James Bible 104.84%
Previously those figures were:
BBC 102.45%
CNN 102.51%
Wikipedia 102.65%
King James Bible 107.03%
So a marked improvement albeit with still room for more.

gopherbot commented 12 years ago

Comment 14 by jgrahamc:

Given that Jean-Loup Gailly who wrote zlib and gzip is a Google employee does it make
sense to involve him in this discussion?  I'd imagine he'd be able to advise on this
pretty quickly.

krasin commented 12 years ago

Comment 15:

I have started seriously considering to change the last name from Krasin to
Krasin-Idiot. See http://golang.org/cl/5556077/
King James Bible, -9:
1392218 pg10.go.gz
1392385 pg10.linux.gz
King James Bible, -6:
1403709 pg10.go.gz
1404431 pg10.linux.gz
It means, that Go implementation now outperforms zlib on this particular example.
Basically, this single "-" cleared the history after each fillDeflate which led to
shorter dictionary.

gopherbot commented 12 years ago

Comment 16 by jgrahamc:

I've made that small change as well and rerun the numbers:
BBC News Linux gzip size = 21095, original Go gzip size = 21612, patched Go gzip size =
21088
CNN Linux gzip size = 19971, original Go gzip size = 20473, patched Go gzip size = 19894
Wikipedia Linux gzip size = 15149, original Go gzip size = 15551, patched Go gzip size =
15172
King James Bible Linux gzip size = 1404452, original Go gzip size = 1503220, patched Go
gzip size = 1403723
In three of the four cases the Google Go gzip size is now smaller than the Linux gzip
size.  The absolute differences are BBC 7 bytes, CNN 77 bytes, Wikipedia -23 bytes,
Bible 729 bytes.  Given that for the the Linux gzip files the gzip header has the
filename in it this is a very, very satisfactory result.  Marvellous!
For completeness here are the percentages
BBC 99.97%
CNN 99.61%
Wikipedia 100.15%
King James Bible 99.95%
Given how close this is (and often better), I'm satisfied.  Thanks for working on this.

krasin commented 12 years ago

Comment 17:

I have just checked that aforementioned "room for improvement at cost of memory" is not
available now -- the compressed size does not become smaller if hashHead is enlarged.
At the same time, I see performance issues:
King James Bible, -9:
time -p ./gogzip -n 9 > pg10.go.gz
real 0.88
user 0.87
sys 0.00
time gzip -9 < pg10.txt > pg10.linux.gz
real    0m0.457s
user    0m0.448s
sys 0m0.008s
King James Bible, -6:
time -p ./gogzip -n 6 > pg10.go.gz
real 0.56
user 0.54
sys 0.01
time gzip -6 < pg10.txt > pg10.linux.gz
real    0m0.231s
user    0m0.220s
sys 0m0.008s
I will try to understand the reason of such a gap (it may be due to the fact that Go
compiler has less optimizations that gcc or it may be the algorithmic slowness)

rsc commented 12 years ago

Comment 18:

If you haven't used gopprof before, see
http://blog.golang.org/2011/06/profiling-go-programs.html
(Only works on Linux, not on Mac.)

krasin commented 12 years ago

Comment 19:

Thanks, Russ. I'm already there:
Total: 12110 samples
    3529  29.1%  29.1%     9979  82.4% compress/flate.(*compressor).deflate
    1686  13.9%  43.1%     1686  13.9% compress/flate.(*compressor).findMatch
    1222  10.1%  53.2%     1236  10.2% compress/flate.(*compressor).fillDeflate
     497   4.1%  57.3%     4827  39.9% compress/flate.(*huffmanBitWriter).writeBlock
     460   3.8%  61.1%      460   3.8% hash/crc32.update
     371   3.1%  64.1%     2146  17.7% runtime.mallocgc
     335   2.8%  66.9%     2500  20.6% compress/flate.(*huffmanEncoder).bitCounts
     332   2.7%  69.6%      332   2.7% compress/flate._func_001
     321   2.7%  72.3%      953   7.9% compress/flate.(*literalNodeSorter).Less
     286   2.4%  74.6%      741   6.1% sweep

krasin commented 12 years ago

Comment 20:

The first trivial improvement here: http://golang.org/cl/5555070/
Instead of filling hashPrev and hashHead with zeros at every fillDeflate, it uses
hashOffset that is adjusted at fill.
Impact (I test on 1.5 GB avi file), -6:
$ time gzip < /tmp/lala.avi > /dev/null
real    0m46.657s
user    0m46.415s
sys 0m0.200s
Original Go (b372a927701e tip)
$ 6g gogzip.go && 6l -o gogzip gogzip.6 && time -p ./gogzip  < /tmp/lala.avi >
/dev/null
real 120.20
user 119.63
sys 0.45
Patched Go:
$ 6g gogzip.go && 6l -o gogzip gogzip.6 && time -p ./gogzip  < /tmp/lala.avi >
/dev/null
real 108.85
user 108.35
sys 0.40
In short:
gzip: 46.657
original Go: 120.20 (2.57x gzip)
patched Go: 108.85 (2.33x gzip, 10% improvement)

krasin commented 12 years ago

Comment 21:

I have another cleanup CL here: http://golang.org/cl/5561056/
When these two CLs are committed, I am going to optimize findMatch. Currently, it's
always called with prevLength = 2. It should be called with larger prevLength if
fastSkipHashing == skipNever and there was a match found at previous step.

krasin commented 12 years ago

Comment 22:

Three CLs committed:
http://code.google.com/p/go/source/detail?r=26dceba5c6100300d585bd5a87d479951f2bd498
http://code.google.com/p/go/source/detail?r=c2b80f5d73a6134afbd8110f8ab665a100dc7c95
http://code.google.com/p/go/source/detail?r=7a7845c5895511438239559968007c5c13c0a6a2
Current stats (on the same .avi file):
gzip: 46 s
Go: 107 s (2.33x)
I will continue looking for options to speed up this.

krasin commented 12 years ago

Comment 23:

I have decided to increase the size of hashHead from 32768 to 131072.
http://golang.org/cl/5569048/
Results:
amd64, core i7 2600, 1.5 GB avi:
gzip: 46 s
Go: 107 s (2.33x)
patched Go: 80.33 s (1.75x)
ARM, Cortex A9 (first 50 MB of the avi):
gzip: 12.53 s
Go: 73 s (5.82x)
patched Go: 65.02 s (5.2x)
This change does not come free. It may affect systems with small L2 cache (like some
Atom processors) or older ARM systems (Cortex A8 and older). At the same time, I hope
that the slowdown should not be too bad, because of heavily reduced hash collision rate.

krasin commented 12 years ago

Comment 24:

On a side note: I don't know why we have such a terrible gap on ARM. Probably, I might
want to look at the assembly later.

krasin commented 12 years ago

Comment 25:

I have decided to include a text file to the benchmark. 
guterberg.txt is a subset of Gutenberg Project texts created with:
$ find /disk2/gutenberg/ -name "[a-h]*.txt" | xargs cat > gutenberg.txt
amd64, core i7 2600, 435MB gutenberg.txt:
gzip: 26 s
Go: 57.88 (2.23x)
patched Go: 57.62 s (2.22x)
ARM, Cortex A9, first 50 MB of guternberg.txt:
gzip: 30.79 s
Go: 84.33 s (2.74x)
patched Go: 84.02 s (2.73x)
Text version is mostly not affected by this change, which suggests us that hash function
is quite good, because it worked well even on small table size.

krasin commented 12 years ago

Comment 26:

What I like about this change (http://golang.org/cl/5569048/) is that it
changes the profile in a very clear way: fixes non-productive findMatch calls in case if
the data is already compressed (like avi).
Before change:
avi:
Total: 10769 samples
    3276  30.4%  30.4%     9896  91.9% compress/flate.(*compressor).deflate
    1866  17.3%  47.7%     1866  17.3% compress/flate.(*compressor).findMatch
     473   4.4%  52.1%     4818  44.7% compress/flate.(*huffmanBitWriter).writeBlock
     425   3.9%  56.1%     2256  20.9% runtime.mallocgc
     420   3.9%  60.0%      420   3.9% hash/crc32.update
     331   3.1%  63.1%     2590  24.1% compress/flate.(*huffmanEncoder).bitCounts
     328   3.0%  66.1%      328   3.0% compress/flate._func_001
     304   2.8%  68.9%      803   7.5% sweep
     300   2.8%  71.7%      966   9.0% compress/flate.(*literalNodeSorter).Less
     238   2.2%  73.9%      793   7.4% compress/flate.literalNodeSorter.Less
text:
Total: 5969 samples
    4594  77.0%  77.0%     4594  77.0% compress/flate.(*compressor).findMatch
     643  10.8%  87.7%     5781  96.9% compress/flate.(*compressor).deflate
     207   3.5%  91.2%      563   9.4% compress/flate.(*huffmanBitWriter).writeBlock
     127   2.1%  93.3%      127   2.1% hash/crc32.update
      95   1.6%  94.9%      167   2.8% compress/flate.(*huffmanBitWriter).writeBits
      68   1.1%  96.1%      157   2.6% compress/flate.(*huffmanBitWriter).writeCode
      40   0.7%  96.7%       77   1.3% compress/flate.(*huffmanBitWriter).flushBits
      26   0.4%  97.2%       26   0.4% compress/flate.offsetCode
      25   0.4%  97.6%       29   0.5% syscall.Syscall
      12   0.2%  97.8%       80   1.3% runtime.mallocgc
After change:
avi:
Total: 8885 samples
    2721  30.6%  30.6%     8061  90.7% compress/flate.(*compressor).deflate
     652   7.3%  38.0%      652   7.3% compress/flate.(*compressor).findMatch
     506   5.7%  43.7%     4719  53.1% compress/flate.(*huffmanBitWriter).writeBlock
     429   4.8%  48.5%     2210  24.9% runtime.mallocgc
     414   4.7%  53.1%      414   4.7% hash/crc32.update
     324   3.6%  56.8%      324   3.6% compress/flate._func_001
     309   3.5%  60.3%      806   9.1% sweep
     307   3.5%  63.7%     2489  28.0% compress/flate.(*huffmanEncoder).bitCounts
     303   3.4%  67.1%      934  10.5% compress/flate.(*literalNodeSorter).Less
     225   2.5%  69.7%      727   8.2% runtime.MCache_Alloc
text:
Total: 5875 samples
    4594  78.2%  78.2%     4594  78.2% compress/flate.(*compressor).findMatch
     585  10.0%  88.2%     5697  97.0% compress/flate.(*compressor).deflate
     198   3.4%  91.5%      541   9.2% compress/flate.(*huffmanBitWriter).writeBlock
     107   1.8%  93.3%      107   1.8% hash/crc32.update
      87   1.5%  94.8%      153   2.6% compress/flate.(*huffmanBitWriter).writeBits
      58   1.0%  95.8%      148   2.5% compress/flate.(*huffmanBitWriter).writeCode
      32   0.5%  96.4%       71   1.2% compress/flate.(*huffmanBitWriter).flushBits
      28   0.5%  96.8%       35   0.6% syscall.Syscall
      22   0.4%  97.2%       22   0.4% compress/flate.offsetCode
      21   0.4%  97.6%       91   1.5% compress/flate.(*huffmanEncoder).bitCounts
(I'm sorry for a lot of small updates on this issue. Feel free to unsubscribe. I would
like to continue posting my progress here)

alberts commented 12 years ago

Comment 27:

Go for it. It's quite interesting to follow along. :-)

gopherbot commented 12 years ago

Comment 28 by jgrahamc:

I'm happy to keep seeing updates. This is a critical issue for me and now that the
compression ratio has been fixed so well I'm keen to see if the speed can also be
brought close to gzip.

krasin commented 12 years ago

Comment 29:

http://golang.org/cl/5569048/ is committed.
Currently, I have evaluated a number of possible optimizations of findMatch and all of
them are effectively defeated by bounds-checking. I'm currently thinking of two possible
ways to go:
1. Use unsafe.Pointer and uintptr within findMatch. It would make the code less readable
2. Put findMatch into .c file
I dislike both options. Russ, what is preferrable? Am I missing something?

gopherbot commented 12 years ago

Comment 30 by daveroundy:

Could you post results with bounds-checking disabled? That would give us an idea how
much it is currently costing us.  My preference (which doesn't really matter) would be
to leave it in "pure'n safe" go, and focus on improvements to the compiler, e.g. if
there are ways that it could determine that bounds-checking is unneeded...

krasin commented 12 years ago

Comment 31:

I take my words back. This is not bounds-checking, at least, on the compressable data. I
have decided to see the distribution of actual tries (see findMatch).
gutenberg.txt:
0: 0
1: 7030274
2: 6152709
3: 5545541
4: 5087985
5: 4695312
6: 4343402
7: 4047307
8: 3772723
9: 3523595
10: 3302578
11: 3104393
12: 2923958
13: 2760165
14: 2609980
15: 2470225
16: 2344119
17: 2221547
18: 2114974
19: 2013985
20: 1925451
21: 1840720
22: 1767779
23: 1701087
24: 1640795
25: 1583392
26: 1529609
27: 1483136
28: 1436692
29: 1395554
30: 1349483
31: 1304079
32: 1258346
33: 1213482
34: 1168298
35: 1121334
36: 1076541
37: 1032775
38: 991983
39: 953072
40: 916813
41: 882443
42: 850751
43: 818134
44: 788817
45: 759793
46: 734664
47: 707436
48: 684333
49: 661871
50: 642868
51: 625065
52: 606764
53: 587920
54: 572157
55: 554170
56: 540519
57: 526047
58: 513336
59: 499707
60: 487171
61: 477184
62: 468153
63: 457002
64: 446098
65: 436812
66: 426783
67: 418369
68: 410367
69: 401949
70: 394634
71: 386615
72: 376867
73: 370831
74: 361638
75: 356390
76: 349221
77: 341120
78: 335207
79: 327415
80: 319881
81: 314124
82: 309320
83: 302548
84: 297445
85: 290371
86: 282705
87: 278527
88: 273746
89: 267290
90: 263012
91: 257661
92: 252553
93: 247228
94: 242580
95: 238915
96: 234916
97: 229083
98: 224706
99: 221516
100: 217734
101: 213113
102: 209530
103: 206625
104: 202630
105: 198859
106: 195559
107: 191800
108: 188813
109: 185493
110: 182859
111: 179525
112: 176293
113: 174206
114: 170770
115: 168445
116: 164788
117: 162212
118: 160223
119: 157799
120: 155457
121: 152843
122: 151004
123: 148517
124: 146771
125: 144575
126: 142698
127: 139581
128: 14601181
avi file:
ActualTriesMap:
0: 0
1: 304332953
2: 38984672
3: 3653538
4: 408016
5: 118865
6: 70082
7: 50490
8: 39877
9: 32578
10: 28537
11: 26114
12: 24136
13: 22615
14: 20953
15: 19872
16: 18952
17: 18199
18: 17621
19: 17402
20: 17296
21: 17029
22: 17012
23: 16717
24: 16608
25: 16154
26: 15905
27: 15975
28: 15847
29: 15544
30: 15608
31: 15535
32: 15200
33: 14892
34: 14562
35: 14527
36: 14196
37: 13670
38: 13399
39: 12826
40: 12277
41: 11834
42: 11352
43: 10750
44: 10180
45: 9630
46: 8940
47: 8300
48: 7865
49: 7216
50: 6734
51: 6199
52: 5806
53: 5296
54: 4837
55: 4452
56: 4063
57: 3651
58: 3215
59: 3011
60: 2679
61: 2420
62: 2407
63: 2221
64: 1976
65: 1703
66: 1592
67: 1418
68: 1218
69: 1117
70: 1039
71: 969
72: 877
73: 801
74: 746
75: 652
76: 573
77: 548
78: 481
79: 432
80: 396
81: 358
82: 330
83: 286
84: 265
85: 263
86: 214
87: 200
88: 161
89: 145
90: 130
91: 98
92: 93
93: 91
94: 79
95: 67
96: 57
97: 55
98: 44
99: 31
100: 27
101: 28
102: 22
103: 27
104: 21
105: 24
106: 16
107: 19
108: 16
109: 17
110: 16
111: 11
112: 12
113: 14
114: 11
115: 6
116: 5
117: 6
118: 7
119: 8
120: 10
121: 9
122: 5
123: 7
124: 10
125: 7
126: 13
127: 14
128: 6674
In case of text, it's not hash collisions, it's the ground truth: there're very popular
character-level trigrams (which may be different, depends on the actual data).
I will think a little bit how to improve the data structure, because the last element in
the text case suggests that we're doing a lot of worthless iterations.

rsc commented 12 years ago

Comment 32:

It is preferable to do neither: keep compress/flate safe.
I suspect that the difference between this code and gzip
has to do with algorithmic issues more than bounds checking.
Have you look at how gzip differs?
Russ

krasin commented 12 years ago

Comment 33:

yes, it's algorithmic issues with http://en.wikipedia.org/wiki/Trigram
We spend too much time to "the", "and" and related stuff, because the lists are really
huge for them.
I am currently trying to produce a solution for hot trigrams. If my idea would not work,
I will try to look into gzip or kindle ask Jean-Loup Gailly to help.

krasin commented 12 years ago

Comment 34:

err. s/kindle/kindly/

krasin commented 12 years ago

Comment 35:

Two updates:
1. Minor cleanup CL that does not affect performance (removing dead code):
http://golang.org/cl/5577061/
2. The second CL dramatically reduce the amount of calls to runtime.GC from
huffmanEncoder: http://golang.org/cl/5573076/ (not completely ready to submit)
The second CL depends on the first (unintentionally).
Stats are given for the maximum compression level (-9):
gutenberg:
gzip: 37.52
Go: 76 (2.02x)
patched Go: 74.43 (1.98x, 2% improvement -- it's because most of the time it's doing
findMatch)
avi:
gzip: 46.21
Go: 90 (1.95x)
patched Go: 67.79 (1.47x, 25% improvement)
Obviously, GC issues only affect large files, but anyway, it's probably the right step.
I will provide stats for other compression levels tomorrow.

rsc commented 12 years ago

Comment 36:

Let's postpone any algorithmic changes to gzip until after Go 1.
I'd like to let things settle so that we have some confidence in
the stability of the code we're going to release.  It's getting to
be too late to be making significant changes, which could be
harboring new bugs to find and fix (or miss).

rsc commented 12 years ago

Comment 37:

Started by imkrasin@gmail.com.

Status changed to Started.

krasin commented 12 years ago

Comment 38:

Russ,
is http://golang.org/cl/5573076/ in the category of algorithmic changes?

rsc commented 12 years ago

Comment 39:

Yes, I'd like to wait on 5573076 until after the Go 1 release.
If there are bugs in the current code, let's fix them, but let's
postpone adding new possibly-buggy code to the repository.
Thanks.
Russ

krasin commented 12 years ago

Comment 40:

Sure. Please, ping me when it would be time for changes. I will polish 5573076
(currently, it's not ready by any definition) and also apply several other pending
improvements.

rsc commented 12 years ago

Comment 41:

Labels changed: added go1.1maybe.

gopherbot commented 11 years ago

Comment 42 by jgrahamc:

Just adding a note that this is still a significant issue for me. Getting the zip
performance close to zlib is important for projects I am working on.

rsc commented 11 years ago

Comment 43:

Labels changed: added size-l.

robpike commented 11 years ago

Comment 44:

Labels changed: removed go1.1maybe.

remyoudompheng commented 11 years ago

Comment 45:

CL 5573076 was superseded by https://golang.org/cl/10006043 (revision
2fe813f4f3c2).

rsc commented 11 years ago

Comment 46:

@imkrasin, you've got 4 weeks to Go1.2 if you want to keep at this.

Labels changed: added go1.2.

rsc commented 11 years ago

Comment 47:

Labels changed: added feature.

robpike commented 11 years ago

Comment 48:

Are we done with this?

krasin commented 11 years ago

Comment 49:

We discussed this with Russ yesterday. I will miss Go 1.2 deadline and will make some
pending improvement sometime in September.
Feel free to close the issue, though. I don't depend on it.

robpike commented 11 years ago

Comment 50:

Looks to me like the work here is done, or at least well done. Plus the commentary is
long. If you have more changes to propose, please create a new issue and/or send mail to
the list to get things going.

Status changed to Fixed.

golang / go

compress/flate: make compressor faster #2726