flanglet / kanzi

Fast lossless data compression in Java
Apache License 2.0
108 stars 18 forks source link

BWTBlockCodec requires 1Gb+ of memory regardless of input or options #13

Closed MalcolmOdd closed 5 years ago

MalcolmOdd commented 5 years ago

Running Kanzi on a 12Kb text file with default codec allocates a buffer of at least 1Gb. See BWTBlockCodec.java: @Override public int getMaxEncodedLength(int srcLen) { return srcLen + BWT_MAX_HEADER_SIZE + BWT.maxBlockSize(); } Where BWT.maxBlockSize() will return a final static constant equal to 1Gb. Is that amount of memory really required? The compressed stream should normally be of the same order as the uncompressed stream. Maybe there should be a Math.min() instead of a sum. With Java 8 on my computer with the default memory options the process fails with an out of memory exception. I need at least -Xmx4g to make it work. Thanks

flanglet commented 5 years ago

Thanks for pointing that out. The code does look wrong (this change was introduced in January 2019) and there is no reason to allocate so much memory for small blocks.

flanglet commented 5 years ago

Fixed by commits f339e60 and 7ebf2b4.