apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

parquet-cli broken in master #2657

Open asfimport opened 3 years ago

asfimport commented 3 years ago

Creating a Jira per this thread:

https://lists.apache.org/thread/k233838g010lvbp81s99floqjmm7nnvs

  1. clone parquet-mr and build the repo locally
  2. run parquet-cli without Hadoop (according to this ReadMe <https://github.com/apache/parquet-mr/tree/master/parquet-cli#running-without-hadoop> )
  3. try a command that deserializes data such as cat or head
  4. observe NoSuchMethodError being thrown

Error stack: ~/repos/parquet-mr/parquet-cli$ parquet cat ../../testdata/dictionaryEncodingSample.parquet WARNING: An illegal reflective access operation has occurred ...... Exception in thread "main" java.lang.NoSuchMethodError: 'org.apache.avro.Schema org.apache.parquet.avro.AvroSchemaConverter.convert(org.apache.parquet.schema.MessageType)' at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89) at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405) at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66) at org.apache.parquet.cli.Main.run(Main.java:157) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.parquet.cli.Main.main(Main.java:187)

Environment: ubuntu 18.04 and ubuntu 20.04 Reporter: Balaji K

Note: This issue was originally created as PARQUET-2104. Please see the migration documentation for further details.

asfimport commented 3 years ago

Balaji K: Is there a workaround that I can possibly use until this is fixed?

If there is an older build that would work, how should I go about installing that? 

asfimport commented 3 years ago

Gabor Szadovszky / @gszadovszky: [~gamaken], I am not sure about a workaround. I've tried this on master as well as on the tags of the releases 1.12.2 and 1.11.2. All works the same way. :(

One idea is to use parquet-tools instead of parquet-cli. It has similar functionality. However, parquet-tools has been deprecated in 1.12.0 and removed in the current master. You may want to try it with an older tag (e.g. apache-parquet-1.11.2).

asfimport commented 2 years ago

Timothy Miller / @theosib-amazon: As I mentioned in https://issues.apache.org/jira/browse/PARQUET-2142, there's a workaround for this. There are duplicate methods in target/parquet-cli-1.13.0-SNAPSHOT-runtime.jar that should be picked up from the dependencies. You can run without hadoop and exclude the runtime by specifying just target/parquet-cli-1.13.0-SNAPSHOT.jar and the deps.