bkaradzic / go-lz4

Port of LZ4 lossless compression algorithm to Go
BSD 2-Clause "Simplified" License
211 stars 23 forks source link

hashTable allocating tons of memory #21

Open manishrjain opened 7 years ago

manishrjain commented 7 years ago
$ go tool pprof populate /tmp/profile764028996/mem.pprof
Entering interactive mode (type "help" for commands)
(pprof) list lz4.Encode
Total: 3.90GB
ROUTINE ======================== github.com/bkaradzic/go-lz4.Encode in /home/ubuntu/go/src/github.com/bkaradzic/go-lz4/writer.go
    3.70GB     3.70GB (flat, cum) 94.89% of Total
         .          .    107:   if len(src) >= MaxInputSize {
         .          .    108:           return nil, ErrTooLarge
         .          .    109:   }
         .          .    110:
         .          .    111:   if n := CompressBound(len(src)); len(dst) < n {
    8.52MB     8.52MB    112:           dst = make([]byte, n)
         .          .    113:   }
         .          .    114:
    3.69GB     3.69GB    115:   e := encoder{src: src, dst: dst, hashTable: make([]uint32, hashTableSize)}
         .          .    116:
         .          .    117:   binary.LittleEndian.PutUint32(dst, uint32(len(src)))
         .          .    118:   e.dpos = 4
         .          .    119:
         .          .    120:   var (

This line in the code is causing Badger to OOM when loading data really fast. Ideally, you want to reuse the same hashTable. Can be done via sync.Pool. Happy to send a PR if that'd help.

dgryski commented 7 years ago

I agree sync.Pool() will help here. Please send a PR.