apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.95k stars 979 forks source link

Question regarding the jdbc driver with Java > 8 #2882

Closed Der-Joscha closed 5 months ago

Der-Joscha commented 9 months ago

Hello,

I am currently working on an application with java 21 that uses the jdbc driver from drill to execute distributed sql queries. Since the module system was introduced with Java 9, we get exceptions on startup that drill cannot patch various dependencies (see stacktrace below). I am aware that drill tries to access classes that are not exported via reflection, which fails for obvious reasons.

What I already did:

Our setup is the following:

Is there a possibility to use the drill jdbc driver with a java version > 8, or are we stuck with our workaround? I will happily provide more information if needed. Thanks in advance.

Stacktrace org.apache.drill.common.util.ProtobufPatcher patchByteString 2024-02-26T12:45:13.072736994Z WARNING: Unable to patch Protobuf. 2024-02-26T12:45:13.072742894Z java.lang.reflect.InaccessibleObjectException: Unable to make protected final java.lang.Class java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain) throws java.lang.ClassFormatError accessible: module java.base does not "opens java.lang" to unnamed module @edc0eb6 2024-02-26T12:45:13.072747295Z at java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391) 2024-02-26T12:45:13.072750695Z at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367) 2024-02-26T12:45:13.072753895Z at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315) 2024-02-26T12:45:13.072756995Z at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:203) 2024-02-26T12:45:13.072759795Z at java.base/java.lang.reflect.Method.setAccessible(Method.java:197) 2024-02-26T12:45:13.072762695Z at javassist.util.proxy.SecurityActions.setAccessible(SecurityActions.java:159) 2024-02-26T12:45:13.072765296Z at javassist.util.proxy.DefineClassHelper$JavaOther.defineClass(DefineClassHelper.java:213) 2024-02-26T12:45:13.072767696Z at javassist.util.proxy.DefineClassHelper$Java11.defineClass(DefineClassHelper.java:52) 2024-02-26T12:45:13.072770496Z at javassist.util.proxy.DefineClassHelper.toClass(DefineClassHelper.java:260) 2024-02-26T12:45:13.072774596Z at javassist.ClassPool.toClass(ClassPool.java:1240) 2024-02-26T12:45:13.072792697Z at javassist.ClassPool.toClass(ClassPool.java:1098) 2024-02-26T12:45:13.072796597Z at javassist.ClassPool.toClass(ClassPool.java:1056) 2024-02-26T12:45:13.072799497Z at javassist.CtClass.toClass(CtClass.java:1298) 2024-02-26T12:45:13.072802398Z at org.apache.drill.common.util.ProtobufPatcher.patchByteString(ProtobufPatcher.java:77) 2024-02-26T12:45:13.072805198Z at org.apache.drill.common.util.ProtobufPatcher.patch(ProtobufPatcher.java:48) 2024-02-26T12:45:13.072808098Z at org.apache.drill.jdbc.Driver.(Driver.java:46) 2024-02-26T12:45:13.072811198Z at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method) 2024-02-26T12:45:13.072813898Z at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1160) 2024-02-26T12:45:13.072816798Z at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(MethodHandleAccessorFactory.java:300) 2024-02-26T12:45:13.072820999Z at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(MethodHandleAccessorFactory.java:103) 2024-02-26T12:45:13.072823999Z at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(ReflectionFactory.java:200) 2024-02-26T12:45:13.072827499Z at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Constructor.java:549) 2024-02-26T12:45:13.072830299Z at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) 2024-02-26T12:45:13.072832999Z at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) 2024-02-26T12:45:13.072835899Z at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:789) 2024-02-26T12:45:13.072838699Z at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729) 2024-02-26T12:45:13.072841400Z at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403) 2024-02-26T12:45:13.072844300Z at java.sql/java.sql.DriverManager$2.run(DriverManager.java:619) 2024-02-26T12:45:13.072847100Z at java.sql/java.sql.DriverManager$2.run(DriverManager.java:599) 2024-02-26T12:45:13.072850200Z at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) 2024-02-26T12:45:13.072853000Z at java.sql/java.sql.DriverManager.ensureDriversInitialized(DriverManager.java:599) 2024-02-26T12:45:13.072855900Z at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:671) 2024-02-26T12:45:13.072858801Z at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:253)
cgivre commented 9 months ago

@Der-Joscha This is interesting. Drill is tested against Java 8, 11, and 17. I had always assumed the same is true of Drill's JDBC driver, but it seems now that might not be the case?

Der-Joscha commented 9 months ago

Thanks for your fast reply, I did not find an alternative artifact in the mvnrepository that indicates it is made for java 11 or 17. Do you know who I can ask, if this is not the right place?

cgivre commented 9 months ago

@Der-Joscha This is the right place, and we can help you out with this. I'm wondering if we are simply not updating what is in maven.

@jnturton Do you know whether the JDBC driver gets updated in mvncentral when we do updates?

cgivre commented 9 months ago

@Der-Joscha One thing... Drill's JDBC driver is located here 1. You might try building and running that yourself with Java 17 and seeing if that works.

Der-Joscha commented 8 months ago

@cgivre Is there anything I should have done except changing maven.compiler.source and maven.compiler.target to 17? With this configuration, it seems, that --add-opens java.base/java.lang=ALL-UNNAMED is still required (regardless of whether there is module-info in my source tree).

jnturton commented 8 months ago

Hi. To this day, every binary distribution of Drill released by the project, including the published Maven artifacts, maintains JDK 8 compatibility. Even when that distro is bundled with a newer JDK, e.g. the Docker image based on JDK 17.

We are using maven as a build system and have org.apache.drill.exec:drill-jdbc:jar:1.21.0 as a dependency

@Der-Joscha, please use the artifact org.apache.drill.exec:drill-jdbc-all:jar:1.21.0 instead, this is the JDBC driver repackaged for distribution.

Drill is tested against Java 8, 11, and 17. I had always assumed the same is true of Drill's JDBC driver, but it seems now that might not be the case?

It is the case. The repackaged driver, jdbc-all, is only minimally tested but I, for one, use it locally under Java 17.

Do you know whether the JDBC driver gets updated in mvncentral when we do updates?

It does, here are the artifacts for 1.21.1.

@Der-Joscha let us know if the module access errors are resolved when you switch to jdbc-all.

Der-Joscha commented 8 months ago

@jnturton Does drill-jdbc-all support modules? When use the drill-jdbc-all artifact, maven complains about the module drill.jdbc.all not being present. I had no issues at compile time with drill-jdbc and requires drill.jdbc, which does not work with drill-jdbc-all.

jnturton commented 8 months ago

There aren't any Java modules delcared in Drill so I'm not sure what requires drill.jdbc could be doing.

Der-Joscha commented 8 months ago

@jnturton According to my build system, it is an automatic module generated from drill-jdbc-1.12.0.jar. As soon, as I am using a module-info for my own application, it is created (and must be requried) by the module system. While this works with drill-jdbc (as long as the jvm is invoked with --add-opens java.base/java.lang=ALL-UNNAMED), when using drill-jdbc-all instead, the automatic module is not found and omitting it in the module-infoleads to the following compile error: package org.apache.drill.jdbc is not visible. Is there a serious disadvantage, if I use drill-jdbc instead of drill-jdbc-all?

jnturton commented 8 months ago

I see, I'm learning things here! The advantages of drill-jdbc-all are

JDBC drivers aren't normally made compile time dependencies since user programs only access the public JDBC API. Are setting the dependency to <scope>runtime</scope> in your pom.xml?

Der-Joscha commented 8 months ago

Currently we are using compile scope, as we would like to use drill beyond the default jdbc api and submit plans instead of sql queries to drill. is it possible to achieve this with the jdbc api?

jnturton commented 8 months ago

Hmm, I would have to go and check but my bet for now is that it isn't. It's certainly possible to submit plans through the REST API but that's not as nice, or fast, if you want to work with query result sets in Java. If you can't do it through JDBC you might want to work directly with the the Drill Java client.

timillar commented 8 months ago

Hello,

I’ve been dealing with a similar issue last couple of days, so I thought maybe some of the things I’ve figured out would be helpful.

Drill jdbc driver tries to patch protobuf and guava libraries and fails due to java internals being closed to reflection by default since java 16. These exceptions are logged (as WARNs), and are actually being swallowed by the ProtobufPatcher / GuavaPatcher, so the first question is:

These patchers also use Javassist’s toClass() without parameters, and I think they recommend to use toClass(Class<?> neighbor) now, to deal with illegal reflective access (https://github.com/jboss-javassist/javassist/issues/369) - though I’m not sure how helpful it will be here.

In my case (trying to run drill-jdbc in quarkus) I had to disable the Patchers explicitly, due to how javassist’s class patching conflicted with quarkus’ own classloader. Disabling them also kinda fixes java’s issues with illegal reflective access. But obviously, this is a very rough hack that I did just to have it working for now.

//must run before drill’s driver was loaded 
var protobufPatchingAttempted = ProtobufPatcher.class.getDeclaredField("patchingAttempted");

protobufPatchingAttempted.setAccessible(true);

protobufPatchingAttempted.setBoolean(null, true);



var guavaPatchingAttempted = GuavaPatcher.class.getDeclaredField("patchingAttempted");

guavaPatchingAttempted.setAccessible(true);

guavaPatchingAttempted.setBoolean(null, true);

I think everything so far doesn’t depend on having module-info in your project, and is caused by changes in reflective access checks since java 16.

As for named modules - while drill-jdbc works as an automatic module, it has some older dependencies that might cause conflicts down the road (e.g., quarkus requires netty 4.1+, but drill-memory-base depends on netty-buffer PoolArena from some older version, and fails with ‘java.lang.NoSuchFieldError: Class io.netty.buffer.PoolArena does not have member field 'int chunkSize’’).

And using drill-jdbc-all as a compile dependency with a named module is impossible, since it doesn’t shadow all the packages under the oadd umbrella, and sooner or later you’ll run into split package issue (e.g., ‘Module ‘your.module.name’ reads package 'org.w3c.dom' from both 'java.xml' and 'drill.jdbc.all'’ , when you add requires java.sql and requires drill.jdbc.all together). So this one can only be used as a runtime dependency.

Der-Joscha commented 8 months ago

@jnturton Currently we are working with a DrillConnection unwrapped from a call to DriverManager.getConnection. On this DrillConnection we call getClient(). Is there a difference in obtaining a DrillClient via jdbc?

Der-Joscha commented 8 months ago

@timillar Thanks for your comment, I am glad that I am not the only one with those kind of issues. What you describe matches my experience and is the reason why I chose drill-jdbc instead of drill-jdbc-all. It seems, that drill is able to patch (at least some of) its dependencies with --add-opens java.base/java.lang=ALL-UNNAMED with drill-jdbc.

how important are these patchers for the latest java / drill versions? If one can live without them, then the simplest 'solution' would be to ignore the warn logs from these classes.

I asked myself that too, as drill seems to run regardless.

jnturton commented 8 months ago

Currently we are working with a DrillConnection unwrapped from a call to DriverManager.getConnection. On this DrillConnection we call getClient(). Is there a difference in obtaining a DrillClient via jdbc?

This sounds like it should work - let me know!

how important are these patchers for the latest java / drill versions?

We actually recently eliminated Drill's patched Guava in the master branch (see #2827). Whether any code paths followed in JDBC driver would have touched patching would need analysis but if you're happy to work from a development release for a bit then you could could develop against 1.22.0-SNAPSHOT instead of 1.21.1. You shouldn't need to upgrade your Drill server, the 1.22.0-SNAPSHOT JDBC driver should be compatible with 1.21.1 (or what you've got, so long as it's not too old).

https://repository.apache.org/content/repositories/snapshots/org/apache/drill/

Der-Joscha commented 7 months ago

I am deeply sorry, but it seems, I still need your help with this. I switched to 1.22.0-SNAPSHOT as you suggested, but it seems, this does not change anything for me.

pom.xml

<dependency>
            <groupId>org.apache.drill.exec</groupId>
            <artifactId>drill-jdbc-all</artifactId>
            <version>1.22.0-SNAPSHOT</version>
</dependency>

with

repositories>
        <repository>
            <id>apache-snapshots</id>
            <url>https://repository.apache.org/content/repositories/snapshots</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>

This produces the following logs: drill-patching note: I passed --add-opens java.base/java.lang=ALL-UNNAMED to the jvm to allow drill patching Stopwatch and Closeables

Is there anything I have missed?

jnturton commented 5 months ago

Sorry for leaving this for so long, and also for misremembering! Drill does still patch Guava, even though we eliminated its shaded copy of Guava. I've promoted this to DRILL-8497 to be fixed in the next release (after 1.21.2) so I'll close here (but please feel free to discuss further on the Jira issue, in Slack, etc.).