apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

TransCompressionCommand Inoperable #2552

Open asfimport opened 3 years ago

asfimport commented 3 years ago

TransCompressionCommand in parquet-tools is intended to allow translation of compression types in parquet files.  We are intending to use this functionality to debug a corrupted file, but this command fails to run at the moment entirely. 

Running the following command (on the uncorrupted file):


java -jar ./parquet-tools-1.11.1.jar trans-compression ~/Downloads/part-00048-69f65188-94b5-4772-8906-5c78989240b5_00048.c000.snappy.parquet

This results in 

 


Unknown command: trans-compression

 

I believe this is due to the Registry class silently catching any errors to initialize which subsequently is misinterpreted as an unknown command.

We need to: 

  1. Write a test for the TransCompressionCommand to figure out why it's showing up as unknown command
  2. Probably expand these tests to cover all the other commands

     

    This will then unblock our debugging work on the suspect file. 

Environment: I am using parquet-tools 1.11.1 on a Mac machine running Catalina, and my parquet-tools jar was downloaded from Maven Central.  Reporter: Shelby Vanhooser

Note: This issue was originally created as PARQUET-1948. Please see the migration documentation for further details.

asfimport commented 3 years ago

Gabor Szadovszky / @gszadovszky: [~vanhooser], this feature is to be released in 1.12.0. See PARQUET-1872. So the error message is correct in the release 1.11.1. Could you please test the feature on master?

asfimport commented 3 years ago

Xinli Shang / @shangxinli: [~vanhooser], glad to see you have the interests of this tool. We have been using it by translating GZIP to ZSTD for existing parquet files. Let me know if you hit any issues.