apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.38k stars 3.5k forks source link

Offer an un-shaded version of the Flight SQL JDBC driver #37892

Open rkennedy-mode opened 1 year ago

rkennedy-mode commented 1 year ago

Describe the enhancement requested

The Arrow Flight SQL JDBC driver shades and relocates all of its Maven dependencies into the driver JAR. While this is probably good and desired for folks who need to download the JAR and install it into whatever tool they're using to connect to something, it can cause an issue for those who are embedding the driver within their own service.

As an example, we're using OpenTelemetry's Java auto instrumentation agent. This agent modifies bytecode as it's loaded to include OpenTelemetry instrumentation for distributed tracing. It has built in support for gRPC clients and servers. Unfortunately, because the Arrow Flight SQL JDBC driver shades and relocates the gRPC libraries, they no longer utilize the same package name as what the Java agent expects. Subsequently, OpenTelemetry does not instrument the client calls and can not perform context propagation from the client to the server.

Would it be possible to publish both a shaded+relocated version of the driver and an un-shaded version of the driver?

The Maven Shade plugin supports building and publishing both artifacts via the shadedArtifactAttached configuration option. The JDBC driver has this option explicitly disabled for some reason.

Component(s)

Java

rkennedy-mode commented 1 year ago

As a test, I made the following change to the flight-sql-jdbc-driver POM:

diff --git a/java/flight/flight-sql-jdbc-driver/pom.xml b/java/flight/flight-sql-jdbc-driver/pom.xml
index 1fd9222be..08e47187f 100644
--- a/java/flight/flight-sql-jdbc-driver/pom.xml
+++ b/java/flight/flight-sql-jdbc-driver/pom.xml
@@ -172,7 +172,8 @@
                             <goal>shade</goal>
                         </goals>
                         <configuration>
-                            <shadedArtifactAttached>false</shadedArtifactAttached>
+                            <shadedArtifactAttached>true</shadedArtifactAttached>
+                            <shadedClassifierName>shaded</shadedClassifierName>
                             <createDependencyReducedPom>false</createDependencyReducedPom>
                             <minimizeJar>false</minimizeJar>
                             <artifactSet>

I then built the entire Java tree with the following command:

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64 mvn -Dmaven.repo.local=/tmp/arrow/.maven_repo clean install

Looking in the local Maven repository, I see the following:

root@0a9e22d01fcb:/tmp/arrow/java# ls -l ../.maven_repo/org/apache/arrow/flight-sql-jdbc-driver/14.0.0-SNAPSHOT/
total 35416
-rw-r--r-- 1 root root      444 Sep 27 17:57 _remote.repositories
-rw-r--r-- 1 root root   125160 Sep 27 17:56 flight-sql-jdbc-driver-14.0.0-SNAPSHOT-cyclonedx.json
-rw-r--r-- 1 root root   111299 Sep 27 17:56 flight-sql-jdbc-driver-14.0.0-SNAPSHOT-cyclonedx.xml
-rw-r--r-- 1 root root 35978036 Sep 27 17:57 flight-sql-jdbc-driver-14.0.0-SNAPSHOT-shaded.jar
-rw-r--r-- 1 root root    10394 Sep 27 17:56 flight-sql-jdbc-driver-14.0.0-SNAPSHOT-tests.jar
-rw-r--r-- 1 root root    11230 Sep 27 17:56 flight-sql-jdbc-driver-14.0.0-SNAPSHOT.jar
-rw-r--r-- 1 root root    10716 Sep 27 17:40 flight-sql-jdbc-driver-14.0.0-SNAPSHOT.pom
-rw-r--r-- 1 root root     1553 Sep 27 17:57 maven-metadata-local.xml

This seems pretty workable for most people. Folks using Maven/Gradle/whatever to manage their projects would switch to pulling the un-shaded artifact and its corresponding dependencies. Folks needing the shaded JAR would have to start specifying the shaded classifier to get that artifact, instead.