apache / arrow-cookbook

Apache Arrow Cookbook
https://arrow.apache.org/
Apache License 2.0
95 stars 46 forks source link

[Java] Java Cookbook fails on 16.0.0-SNAPSHOT #347

Closed amoeba closed 3 months ago

amoeba commented 5 months ago

When I run make javatest from the latest main with ARROW_NIGHTLY=1, the Flight cookbook fails with,

Running with Arrow version: 16.0.0-SNAPSHOT
...snip...
Exception java.lang.NoClassDefFoundError: io/grpc/BindableService
      at FlightServer.builder (FlightServer.java:169)
      at (#39:4)
Caused by: java.lang.ClassNotFoundException: io.grpc.BindableService
      at BuiltinClassLoader.loadClass (BuiltinClassLoader.java:641)
      at ClassLoaders$AppClassLoader.loadClass (ClassLoaders.java:188)
      at ClassLoader.loadClass (ClassLoader.java:525)
      ...

I'm not sure what might be causing this but I wonder if it's related to the recent JPMS changes. @jduo do you have any ideas? In case you're not familiar with how the Java cookbooks run, they essentially,

  1. Generate a classpath using a stub pom.xml in https://github.com/apache/arrow-cookbook/blob/main/java/source/demo/pom.xml and by calling maven, https://github.com/apache/arrow-cookbook/blob/83aff2d04e3ef4f70895475e52e01d1c974e793c/java/ext/javadoctest.py#L53-L58
  2. Runs each code snippet with jshell using the above classpath via https://github.com/apache/arrow-cookbook/blob/83aff2d04e3ef4f70895475e52e01d1c974e793c/java/ext/javadoctest.py#L80

TODOs from the thread below:

jduo commented 5 months ago

I haven't seen this error.

It could be related to JPMS. We also updated gRPC's few times, merged some flight modules into one module, and changed some of the shading in Flight (though that may have only affected clients and not the server).

I'll take a peek at the cookbook and let you know how it goes @amoeba

amoeba commented 5 months ago

Thanks so much @jduo.

jduo commented 5 months ago

@amoeba , I was able to make this work by making some changes to the POM. This seems like the wrong way to resolve the issue though -- we seem to have some problems getting transitive dependencies.

In the demo pom.xml:

@lidavidm , do you know if the flight-core artifact is supposed to be the shaded classifier by default? And any ideas why we might not be inheriting transitive dependencies anymore?

lidavidm commented 5 months ago

I don't think shading is meant to be the default. I didn't even realize we shipped a separate shaded version. Regardless, can we just compare 15.0.0 and 16.0.0 and see what might have changed?

lidavidm commented 5 months ago

In fact, https://github.com/apache/arrow/commit/92682f0f6064224acd6dd746ac45d6df4b1963c4 seems to be the only commit in 16.0.0 touching flight-core.

jduo commented 5 months ago

When using the unshaded flight-core artifact I add grpc dependencies explicitly:

        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-api</artifactId>
            <version>1.63.0</version>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-netty</artifactId>
            <version>1.63.0</version>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-stub</artifactId>
            <version>1.63.0</version>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-protobuf</artifactId>
            <version>1.63.0</version>
        </dependency>

and this leads to the following error:

Exception java.lang.IllegalAccessError: class org.apache.arrow.flight.impl.Flight$FlightDescriptor tried to access method 'com.google.protobuf.LazyStringArrayList com.google.protobuf.LazyStringArrayList.emptyList()' (org.apache.arrow.flight.impl.Flight$FlightDescriptor and com.google.protobuf.LazyStringArrayList are in unnamed module of loader 'app')
      at Flight$FlightDescriptor.<init> (Flight.java:7034)
      at Flight$FlightDescriptor.<clinit> (Flight.java:7738)
      at FlightServiceGrpc.getGetFlightInfoMethod (FlightServiceGrpc.java:106)
      at FlightServiceGrpc.getServiceDescriptor (FlightServiceGrpc.java:1180)
      at FlightServiceGrpc.bindService (FlightServiceGrpc.java:1059)
      at FlightServiceGrpc$FlightServiceImplBase.bindService (FlightServiceGrpc.java:558)
      at FlightBindingService.bindService (FlightBindingService.java:96)
      at ServerInterceptors.intercept (ServerInterceptors.java:93)
      at FlightServer$Builder.build (FlightServer.java:306)
      at (#39:4)

I'm looking into this. However it's worth noting that using dependency:build-classpath doesn't separate out JARs that should go on the module-path (ie JPMS modules) vs. JARs that should go on the classpath.

jshell requires you to explicitly put JARs that are to be used as JPMS modules on the module-path parameter. Everything put on the classpath goes into the unnamed module, which is why in the error above, FlightDescriptor is being shown as part of the unnamed module instead of the org.apache.flight.core module.

This differs from running tests from maven, where maven inspects JAR metadata to figure out if they support JPMS or not.

lidavidm commented 5 months ago

Hmm so are we just invoking JShell wrongly then?

jduo commented 5 months ago

I was able to get past the above error by adding an explicit dependency on protobuf to the demo pom too:

        <dependency>
            <groupId>com.google.protobuf</groupId>
            <artifactId>protobuf-java</artifactId>
            <version>3.23.1</version>
        </dependency>

It seems that unnamed modules cannot access other dependencies in the unnamed module that have been brought in transitively, but can when they have been brought in explicitly: https://stackoverflow.com/questions/70439672/illegalaccesserror-classa-and-class-b-are-in-unnamed-module-of-loader-app

Hmm so are we just invoking JShell wrongly then?

The right thing to do is likely to correctly build module-path and classpath separately when using Arrow 16+. I don't see a friendly way to do this compared to using the build-classpath target though. Maybe jshell isn't the way to go too and we should run via maven.

We probably should step back and think about what we want to support. For example do we intend to allow for users to put Arrow in the unnamed module? If so, how do we make this friendlier -- alternate artifacts that skip adding module-info.java perhaps.

lidavidm commented 5 months ago

I'd rather not try to support every possible permutation of ways to do things - unless there's a real need from other users then I'd vote that we try to fix what's happening here instead of adding yet another artifact upstream.

lidavidm commented 5 months ago

I think we chose JShell here initially because it had less ceremony required (in particular not requiring nesting everything in a class/main method). If we have to move back to using "regular" classes then so be it.

amoeba commented 5 months ago

Since it sounds like there's not an easy way to tweak the current setup (using maven to pass the right info to jshell), it seems like refactoring the Java cookbook to use something like mvn exec:java might be the long-term fix here. Though I think we could keep the examples as-is and avoid wrapping them in a class if we just use some string manipulation.

Does that seems like a good plan? I'm happy to take that on.

jduo commented 5 months ago

Yeah that sounds like a good solution. It would work with both pre and post-JPMS builds. I haven't checked if mvn exec:java would automatically put modules on module-path like surefire does though.

amoeba commented 4 months ago

Today I started in on a rewrite of javadoctest.py that uses exec:java. I have the basic shell of it working so I think the approach will work. What I'm doing so far is basically:

I should have a PR up early next week.

amoeba commented 3 months ago

I put a PR up at https://github.com/apache/arrow-cookbook/pull/350 that refactors the Java cookbooks to use Maven instead of JShell.