Closed findepi closed 3 months ago
The general strategy for LZO is to only implement what Hive uses (and generally what it uses by default), because LZO has tons of flags that rarely get used. If this is important we can dig into this to see what would need to be implemented
For LZOP, Hadoop doesn't set any flags, so we never dug into that. Looking a the source for LZOP, it looks like most of the flags are junk we can ignore, like which OS the file was created on. The code is here: https://github.com/mirror/lzop/blob/1941c6fb1c8f5616aa74144fba09a13013f24f45/src/conf.h
I run into this with @ilfrin today when troubleshooting some issue.
Of course, normally files won't be created lzop
cli... Unless someone is doing what we did, troubleshooting some issues.
it looks like most of the flags are junk we can ignore, like which OS the file was created on
would be great to just ignore them
For reference, here's the file
base64 -d <<"EOF" >moje.lzo
iUxaTwANChoKECAgMAlAAQUDAAABAACBpFyPynkAAAAAQDIwMTkwMzE4XzE2NDAxMl8wMDAxMl9r
dWVyN19jOWMwZDc3Yi0yOGFmLTRhYmItYjBiNS01MjQ0NTZiY2U0YTFfIRV+AAAABAAAAAQDfgEx
YWJjCgAAAAA=
EOF
What I did:
presto-product-tests/conf/docker/singlenode/compose.sh up
, i.e.hdp2.6-hive
)yum install -y lzop
abc\n
and compress it withlzop -o output.lzo inputfile
format = 'TEXTFILE'
in Presto, add the file to itObserved
.m2/repository/org/anarres/lzo/lzo-core/1.0.5/lzo-core-1.0.5.jar
and.m2/repository/org/anarres/lzo/lzo-hadoop/1.0.5/lzo-hadoop-1.0.5.jar
to classpath and enable the codec in site.xml