airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
566 stars 112 forks source link

LzopCodec (.lzo) failure: "Unsupported LZO flags 50331649" #97

Closed findepi closed 3 months ago

findepi commented 5 years ago

What I did:

  1. Take Hive from Presto product tests (presto-product-tests/conf/docker/singlenode/compose.sh up, i.e. hdp2.6-hive)
  2. yum install -y lzop
  3. create a text file with abc\n and compress it with lzop -o output.lzo inputfile
  4. create a table with format = 'TEXTFILE' in Presto, add the file to it

Observed

Caused by: java.io.IOException: Unsupported LZO flags 50331649
    at io.airlift.compress.lzo.HadoopLzopInputStream.<init>(HadoopLzopInputStream.java:93)
    at io.airlift.compress.lzo.LzopCodec.createInputStream(LzopCodec.java:91)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:122)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at io.prestosql.plugin.hive.HiveUtil.createRecordReader(HiveUtil.java:220)
    ... 20 more
dain commented 5 years ago

The general strategy for LZO is to only implement what Hive uses (and generally what it uses by default), because LZO has tons of flags that rarely get used. If this is important we can dig into this to see what would need to be implemented

dain commented 5 years ago

For LZOP, Hadoop doesn't set any flags, so we never dug into that. Looking a the source for LZOP, it looks like most of the flags are junk we can ignore, like which OS the file was created on. The code is here: https://github.com/mirror/lzop/blob/1941c6fb1c8f5616aa74144fba09a13013f24f45/src/conf.h

findepi commented 5 years ago

I run into this with @ilfrin today when troubleshooting some issue. Of course, normally files won't be created lzop cli... Unless someone is doing what we did, troubleshooting some issues.

it looks like most of the flags are junk we can ignore, like which OS the file was created on

would be great to just ignore them

findepi commented 5 years ago

For reference, here's the file

base64 -d <<"EOF" >moje.lzo
iUxaTwANChoKECAgMAlAAQUDAAABAACBpFyPynkAAAAAQDIwMTkwMzE4XzE2NDAxMl8wMDAxMl9r
dWVyN19jOWMwZDc3Yi0yOGFmLTRhYmItYjBiNS01MjQ0NTZiY2U0YTFfIRV+AAAABAAAAAQDfgEx
YWJjCgAAAAA=
EOF