airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
562 stars 111 forks source link

Zstd Decompress does support files compressed by newer C version? #150

Closed coolcfxp closed 2 years ago

coolcfxp commented 2 years ago

I was recently trying to use this library to decode a compressed HDF5 file, the compression side uses the native library version 1.4.5, when I tried to decode it reports "Input is corrupted: offset=2305". Then I switched to zstd-jni, it worked. I wonder if the later versions or some strategies (or levels) are not supported?

dain commented 2 years ago

It is likely a bug. Can you share a file that reproduces this? These kinds of bugs are generally impossible to fix without a reproduction.

dain commented 2 years ago

Also a stack trace would help.

BTW did you try decoding the file to make sure it was not corrupted? Some of the native libraries for zstd disabled the corruption checks to improve performance, so it "works" but the output is actually corrupted.

coolcfxp commented 2 years ago

I was using this along with the jhdf (https://github.com/jamesmudd/jhdf), as a filter. When I reported this issue I thought it was the decoder's problem, but later when I tried to isolate a frame of data which reported an error to a separate file, and used AC to decode it, it didn't report an error.

So I thought it might be threading issue. So then I tried to use ThreadLocal, and just create a new instance every time when I needed to, the old corruption error disappeared, instead it gets a coredump, which I could not comprehence.

hs_err_pid2188.log

The original file is about 1.7Gb, I'm not sure hot to send it to you.

dain commented 2 years ago

The hs error log doesn't really help, because it just says that the VM crashed in the core JVM binary with no reference to aircompressor.

I'm not going to be able to dig through a 1.7GB file in a format I'm unfamiliar with. Can you isolate the problematic compressed frame and post just that frame?

coolcfxp commented 2 years ago

the problem is, when I did so, there was no problem. I had to guess it has something to do with the jhdf lib as well. Let me try to bring this issue to the jhdf guy and see if he/she has some ideas. Will update when more to come.

coolcfxp commented 2 years ago

I think I found the bug which was in my code. I assigned output length more than the size of the output array accidentally. It works now , sorry for bothering you with my own bug, and thank you very much...