Homebrew / homebrew-core

🍻 Default formulae for the missing package manager for macOS (or Linux)
https://brew.sh
BSD 2-Clause "Simplified" License
13.39k stars 12.17k forks source link

parquet-cli has an extra dependency jar #172168

Open mkmik opened 1 month ago

mkmik commented 1 month ago

brew gist-logs <formula> link OR brew config AND brew doctor output

HOMEBREW_VERSION: 4.3.0-93-ge0bc557
ORIGIN: https://github.com/Homebrew/brew
HEAD: e0bc557e7b991cb23583679e1cf1c8a92b793aeb
Last commit: 3 hours ago
Core tap HEAD: 51e6e876bfa1ee5bfda14b0b9d56d9546b231c8e
Core tap last commit: 6 days ago
Core tap JSON: 20 May 11:52 UTC
Core cask tap HEAD: 8f9e1d376a3ea346f9d5b156d7a9789c73b1f2a9
Core cask tap last commit: 6 days ago
Core cask tap JSON: 20 May 11:52 UTC
HOMEBREW_PREFIX: /opt/homebrew
HOMEBREW_CASK_OPTS: []
HOMEBREW_DISPLAY: /private/tmp/com.apple.launchd.TZMewcwAfn/org.xquartz:0
HOMEBREW_EDITOR: code -g -w
HOMEBREW_MAKE_JOBS: 16
HOMEBREW_SORBET_RUNTIME: set
Homebrew Ruby: 3.3.1 => /opt/homebrew/Library/Homebrew/vendor/portable-ruby/3.3.1/bin/ruby
CPU: 16-core 64-bit arm_palma
Clang: 15.0.0 build 1500
Git: 2.45.1 => /opt/homebrew/bin/git
Curl: 8.4.0 => /usr/bin/curl
macOS: 14.3-arm64
CLT: 15.3.0.0.1.1708646388
Xcode: 15.3
Rosetta 2: false

Verification

What were you trying to do (and why)?

I'm trying to use the parquet CLI tool, example invocation:

parquet schema foo.parquet

What happened (include all command output)?

Exception in thread "main" java.lang.NoSuchMethodError: 'shaded.parquet.org.apache.avro.Schema org.apache.parquet.avro.AvroSchemaConverter.convert(org.apache.parquet.schema.MessageType)'
        at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:85)
        at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:396)
        at org.apache.parquet.cli.commands.SchemaCommand.getSchema(SchemaCommand.java:104)
        at org.apache.parquet.cli.commands.SchemaCommand.run(SchemaCommand.java:82)
        at org.apache.parquet.cli.Main.run(Main.java:163)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
        at org.apache.parquet.cli.Main.main(Main.java:191)

What did you expect to happen?

{
  "type" : "record",
  "name" : "schema",
  "fields" : [ {
    "name" : "foo",
    "type" : "string"
  }]
}

Step-by-step reproduction instructions (by running brew commands)

* `brew install parquet-cli`
* `parquet schema foo.parquet`

Workaround

The issue can be fixed by removing an extra jar the package accidentally installs.

rm $(brew --cellar)/parquet-cli/1.14.0/libexec/parquet-avro-1.14.0.jar

The upstream ticket https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-2142?filter=allissues basically just says that the avro dependency is bundled in the main parquet-cli jar, but the signature of the method has been altered, making it incompatible with the parquet-avro jar, which gets pulled in as a dependency and included by the parquet-cli formula, but it shouldn't.

SMillerDev commented 1 month ago

Can't we use this as a patch? https://github.com/apache/parquet-mr/commit/62b774cd0f0c60cfbe540bbfa60bee15929af5d4

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

mkmik commented 2 weeks ago

Can't we use this as a patch? apache/parquet-java@62b774c

well, yes; although that's just a patch to the readme so you have to interpret what it says and make the changes in the formula