airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
562 stars 111 forks source link

Got "Overflow detected" at presto-orc with zstd #122

Closed miniway closed 2 years ago

miniway commented 3 years ago

I got "Overflow detected" at creating an ORC file through presto-orc from large input data so I could not easily reproduce. That might be because I'm using a bit old version of presto 317.

java.lang.IllegalStateException: Overflow detected
    at io.airlift.compress.zstd.Util.checkState(Util.java:59)
    at io.airlift.compress.zstd.BitOutputStream.close(BitOutputStream.java:85)
    at io.airlift.compress.zstd.HuffmanCompressor.compressSingleStream(HuffmanCompressor.java:130)
    at io.airlift.compress.zstd.HuffmanCompressor.compress4streams(HuffmanCompressor.java:75)
    at io.airlift.compress.zstd.ZstdFrameCompressor.encodeLiterals(ZstdFrameCompressor.java:333)
    at io.airlift.compress.zstd.ZstdFrameCompressor.compressBlock(ZstdFrameCompressor.java:224)
    at io.airlift.compress.zstd.ZstdFrameCompressor.compressFrame(ZstdFrameCompressor.java:172)
    at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:145)
    at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
    at io.prestosql.orc.OrcOutputBuffer.writeChunkToOutputStream(OrcOutputBuffer.java:445)
    at io.prestosql.orc.OrcOutputBuffer.flushBufferToOutputStream(OrcOutputBuffer.java:425)
    at io.prestosql.orc.OrcOutputBuffer.close(OrcOutputBuffer.java:146)
    at io.prestosql.orc.stream.LongOutputStreamV2.close(LongOutputStreamV2.java:739)
    at io.prestosql.orc.writer.SliceDirectColumnWriter.close(SliceDirectColumnWriter.java:139)
    at io.prestosql.orc.writer.SliceDictionaryColumnWriter.close(SliceDictionaryColumnWriter.java:324)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
    at io.prestosql.orc.OrcWriter.bufferStripeData(OrcWriter.java:369)
    at io.prestosql.orc.OrcWriter.flushStripe(OrcWriter.java:331)
    at io.prestosql.orc.OrcWriter.close(OrcWriter.java:444)

But I have a question on the these two lines as I'm not familiar with the zstd. If currentAddress < outputLimit is expected at line 85, currentAddress = outputLimit looks confusing at line 73. Just raising an exception here might be less confusing.

https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/zstd/BitOutputStream.java#L73

https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/zstd/BitOutputStream.java#L85

findepi commented 3 years ago

Thanks @miniway for the report! Can you perhaps try to reproduce this in a synthetic way, eg using tpch data as the input? That would also help verify the problem still exists in the latest release.

miniway commented 3 years ago

Unfortunately I could not easily reproduce the issue, but my gut feeling is it doesn't seem to happen at 0.15 but it happens rarely at 0.18

dongjoon-hyun commented 3 years ago

Hi, All. Apache ORC community also is blocked by this issue. This causes UT failures consistently. Apache ORC community has been using 0.16 without this issue. New 0.17 and 0.18 seems to have this issue.

The relevant change of airlift is this commit.

martint commented 3 years ago

Thanks for narrowing down the issue to that commit. I’ll take a look soon.

martint commented 3 years ago

Also, do you have a data set that reproduces it consistently? I’d like to investigate whether there’s a bug elsewhere that is being surfaced by that change.

martint commented 3 years ago

The formula matches the one in the reference implementation, so this must be a bug somewhere else. It would help a lot to have a dataset that reproduces the problem.

define ZSTD_COMPRESSBOUND(srcSize) ((srcSize) + ((srcSize)>>8) + (((srcSize) < (128<<10)) ? (((128<<10) - (srcSize)) >> 11) / margin, from 64 to 0 / : 0))

dongjoon-hyun commented 3 years ago

Update: Apache ORC upgrades to 0.19 finally. Thanks!

melin commented 3 years ago

aircompression 0.20, The same problem occurred

image

dongjoon-hyun commented 3 years ago

Thank you for reporting, @melin . If you don't mind, could you file an Apache ORC JIRA issue at https://issues.apache.org/jira/projects/ORC with the above information, please?

martint commented 3 years ago

I've been able to reproduce this locally. I'm investigating.

dongjoon-hyun commented 3 years ago

Thank you, @martint !