Closed geniot closed 2 years ago
Does #25 reproduce your case?
Yes, it does. Both on Windows 10 and Ubuntu. Thanks for this test. It gives me the same exception. Right now I'm trying to find the bug. It seems like BUF_LEN = 58315 is a very important constant. It should be used both for deflation and inflation.
Or did you mean that this test passes successfully on your computer? In this case what is the JDK that you are using? I realize that this exception may be connected to https://bugs.openjdk.java.net/browse/JDK-8200671 But I tried different versions and providers and even operating systems (Linux and Windows).
As a result of investigation on test case, compression with dictzip-java produce broken archive. It is because dictzip linux command failed to read from generated archive file.
Please consider dictzip linux command for alternative for data compression. https://linux.die.net/man/1/dictzip https://packages.ubuntu.com/impish/dictzip https://sourceforge.net/projects/dict/
Thank you. I decided to write my own implementation: https://github.com/geniot/elex/blob/master/src/main/java/io/github/geniot/elex/ezip/model/ElexDictionary.java It was inspired by dictzip I think. The dictionaries here https://elex.mobi/index.html are zipped using ElexDictionary. Header is located at the end of a RandomAccessFile and it contains chunk offsets and headword "starters".
I think #38 fix the bug. The root causes are two;
FULL_FLUSH
for deflater flag but it has used SYNC_FLUSH
.v0.10.3 with fix is out.
I have a 45Mb text file that I compress with DictZipOutputStream to a 11Mb file. I then try to seek and read it with DictZipInputStream. I get:
Could you try to dictzip a large file and seek to the end and read the last bytes. There's something wrong with the header I think.
Strangely though, if I try to read the whole file with seek(0) , read(bbs.length) I get all data. So data is in the file but seek-read sequence is not working correctly on a large file.