flanglet / kanzi

Fast lossless data compression in Java
Apache License 2.0
108 stars 18 forks source link

"index out of range" in BWT transform #5

Closed pzo closed 7 years ago

pzo commented 7 years ago

Running kanzi on a truncated "bib" file from from calgary test suite generates an error message.

./Kanzi -compress -input=bib-truncated -output=bib.kanzi -transform=bwt -entropy=none -overwrite

_Kanzi 1.0 (C) 2017, Frederic Langlet
Encoding ... panic: runtime error: index out of range

goroutine 5 [running]: kanzi/transform.(DivSufSort).ssMultiKeyIntroSort(0xc420072980, 0xb273, 0xb37, 0xe14, 0x2) /home/user/go/src/kanzi/transform/DivSufSort.go:1261 +0x530 kanzi/transform.(DivSufSort).ssSort(0xc420072980, 0xb273, 0xb37, 0xe14, 0x522d, 0x6046, 0x2, 0x104a0, 0x0) /home/user/go/src/kanzi/transform/DivSufSort.go:452 +0x2ac kanzi/transform.(DivSufSort).sortTypeBstar(0xc420072980, 0xc42009f800, 0x100, 0x100, 0xc42020a000, 0x10000, 0x10000, 0x104a0, 0xc420072980) /home/user/go/src/kanzi/transform/DivSufSort.go:280 +0x8c8 kanzi/transform.(DivSufSort).ComputeSuffixArray(0xc420072980, 0xc4201e6000, 0x104a0, 0x104a0, 0x104a1, 0x104a1, 0x1b600) /home/user/go/src/kanzi/transform/DivSufSort.go:112 +0xdb kanzi/transform.(BWT).Forward(0xc42006a8a0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8003, 0x104a1, 0x104a1, 0x0, 0xc420040c80, 0x40cd6d, ...) /home/user/go/src/kanzi/transform/BWT.go:137 +0x12e kanzi/function.(BWTBlockCodec).Forward(0xc42000c0b0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8000, 0x104a4, 0x104a4, 0x4396bb, 0x10, 0x10100000053d300, ...) /home/user/go/src/kanzi/function/BWTBlockCodec.go:73 +0x110 kanzi/function.(ByteTransformSequence).Forward(0xc42000ade0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8000, 0x104a4, 0x104a4, 0x0, 0x0, 0x0, ...) /home/user/go/src/kanzi/function/ByteTransformSequence.go:85 +0x1eb kanzi/io.(EncodingTask).encode(0xc4200cc380) /home/user/go/src/kanzi/io/CompressedStream.go:466 +0xb97 created by kanzi/io.(*CompressedOutputStream).processBlock /home/user/go/src/kanzi/io/CompressedStream.go:391 +0x39d_

bib.zip

flanglet commented 7 years ago

Hmm, that can't be good. Ok, the file works with the Java and C++ implementations, so I messed up something in the port to Go. Thanks for the report. I will take a deeper look and fix it when I have some time.

flanglet commented 7 years ago

Ok. Fixed. It was a silly bug with a simple fix.

./Kanzi -compress -input=/tmp/bib-truncated -output=none -entropy=none -transform=bwt

Kanzi 1.0 (C) 2017, Frederic Langlet Encoding ...

Encoding: 4 ms Input size: 66720 Output size: 66739 Ratio: 1.000285 Throughput (KB/s): 16289