datastax / cassandra-quarkus

An Apache Cassandra(R) extension for Quarkus
Apache License 2.0
40 stars 28 forks source link

[JAVA-2616] Ensure Native.isCurrentTimeMicrosAvailable reports correct state #55

Closed adutra closed 4 years ago

adutra commented 4 years ago

Imported from [JAVA-2616](https://datastax-oss.atlassian.net/browse/JAVA-2616), originally reported by adutra on Mon, 20 Jan 2020 17:08:32 GMT.


Under GraalVM, the driver is unable to instantiate AtomicTimestampGenerator:

java.lang.UnsatisfiedLinkError: com.kenai.jffi.Foreign.invokeN2O1(JJJJLjava/lang/Object;III)J [symbol: Java_com_kenai_jffi_Foreign_invokeN2O1 or Java_com_kenai_jffi_Foreign_invokeN2O1__JJJJLjava_lang_Object_2III]
    at com.oracle.svm.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:145)
    at com.oracle.svm.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:57)
    at com.kenai.jffi.Foreign.invokeN2O1(Foreign.java)
    at com.kenai.jffi.Invoker.invokeN2(Invoker.java:1038)
    at com.datastax.oss.driver.internal.core.os.Native$LibCLoader$LibC$jnr$ffi$2.gettimeofday(Unknown Source)
    at com.datastax.oss.driver.internal.core.os.Native.currentTimeMicros(Native.java:57)
    at com.datastax.oss.driver.internal.core.time.NativeClock.fetchTimeMicros(NativeClock.java:74)
    at com.datastax.oss.driver.internal.core.time.NativeClock.<init>(NativeClock.java:57)
    at com.datastax.oss.driver.internal.core.time.Clock.getInstance(Clock.java:41)
    at com.datastax.oss.driver.internal.core.time.MonotonicTimestampGenerator.buildClock(MonotonicTimestampGenerator.java:109)
    at com.datastax.oss.driver.internal.core.time.MonotonicTimestampGenerator.<init>(MonotonicTimestampGenerator.java:43)
    at com.datastax.oss.driver.internal.core.time.AtomicTimestampGenerator.<init>(AtomicTimestampGenerator.java:52)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:229)

It seems that Native.isCurrentTimeMicrosAvailable() is returning true here, but the native call isn't available.

We should 1) investigate what's happening and 2) maybe also improve the implementation of com.datastax.oss.driver.internal.core.time.Clock#getInstance to catch LinkageError and automatically switch to Java clocks.

┆Issue is synchronized with this Jira Task by Unito

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @phact on Mon, 20 Jan 2020 17:28:58 GMT.


I confirm that the workaround to `force-java-clock` circumvents the error 🙂

datastax-java-driver { advanced.timestamp-generator.force-java-clock = true}

in `application.conf`

I’ll pull debug logs to see if we can find out why it’s not switching to java clocks automatically.

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @phact on Mon, 20 Jan 2020 18:44:15 GMT.


Here we go:

13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.retry.DefaultRetryPolicy
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a SpeculativeExecutionPolicy from config option advanced.speculative-execution-policy.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name NoSpeculativeExecutionPolicy
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.specex.NoSpeculativeExecutionPolicy
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a AddressTranslator from config option advanced.address-translator.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name PassThroughAddressTranslator
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.addresstranslation.PassThroughAddressTranslator
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a NodeStateListener from config option advanced.node-state-listener.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name NoopNodeStateListener
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.metadata.NoopNodeStateListener
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a RequestTracker from config option advanced.request-tracker.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name NoopRequestTracker
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.tracker.NoopRequestTracker
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a RequestThrottler from config option advanced.throttler.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name PassThroughRequestThrottler
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.session.throttling.PassThroughRequestThrottler
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a AuthProvider from config option advanced.auth-provider.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Option is not defined, skipping
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a SslEngineFactory from config option advanced.ssl-engine-factory.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Option is not defined, skipping
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a TimestampGenerator from config option advanced.timestamp-generator.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name AtomicTimestampGenerator
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.time.AtomicTimestampGenerator
13:42:29 INFO  [co.da.os.dr.in.co.ti.Clock] (s0-admin-0) Using native clock for microsecond precision
13:42:29 DEBUG [co.da.os.dr.in.co.co.EventBus] (s0-admin-0) [s0] Registering com.datastax.oss.driver.internal.core.metadata.LoadBalancingPolicyWrapper$$Lambda$bf70e5ca416854d866be64436cd33a1fab9e8815@7f07db471b30 for class com.datastax.oss.driver.internal.core.metadata.NodeStateEvent
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Creating a TimestampGenerator from config option advanced.timestamp-generator.class
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Building from unqualified name AtomicTimestampGenerator
13:42:29 DEBUG [co.da.os.dr.in.co.ut.Reflection] (s0-admin-0) Trying with default package com.datastax.oss.driver.internal.core.time.AtomicTimestampGenerator
13:42:29 INFO  [co.da.os.dr.in.co.ti.Clock] (s0-admin-0) Using native clock for microsecond precision
13:42:32 INFO  [pr.ProjectResource] (main) Could not connect to cassandra
java.lang.IllegalArgumentException: Error instantiating class AtomicTimestampGenerator (specified by advanced.timestamp-generator.class): com.kenai.jffi.Foreign.invokeN2O1(JJJJLjava/lang/Object;III)J [symbol: Java_com_kenai_jffi_Foreign_invokeN2O1 or Java_com_kenai_jffi_Foreign_invokeN2O1__JJJJLjava_lang_Object_2III]
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:236)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:94)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildTimestampGenerator(DefaultDriverContext.java:363)
    at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getTimestampGenerator(DefaultDriverContext.java:732)
    at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:327)
    at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1000(DefaultSession.java:280)
    at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:126)
    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
    at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:73)
    at io.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
    at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
    at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:460)
    at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:193)
Caused by: java.lang.UnsatisfiedLinkError: com.kenai.jffi.Foreign.invokeN2O1(JJJJLjava/lang/Object;III)J [symbol: Java_com_kenai_jffi_Foreign_invokeN2O1 or Java_com_kenai_jffi_Foreign_invokeN2O1__JJJJLjava_lang_Object_2III]
    at com.oracle.svm.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:145)
    at com.oracle.svm.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:57)
    at com.kenai.jffi.Foreign.invokeN2O1(Foreign.java)
    at com.kenai.jffi.Invoker.invokeN2(Invoker.java:1038)
    at com.datastax.oss.driver.internal.core.os.Native$LibCLoader$LibC$jnr$ffi$2.gettimeofday(Unknown Source)
    at com.datastax.oss.driver.internal.core.os.Native.currentTimeMicros(Native.java:57)
    at com.datastax.oss.driver.internal.core.time.NativeClock.fetchTimeMicros(NativeClock.java:74)
    at com.datastax.oss.driver.internal.core.time.NativeClock.<init>(NativeClock.java:57)
    at com.datastax.oss.driver.internal.core.time.Clock.getInstance(Clock.java:41)
    at com.datastax.oss.driver.internal.core.time.MonotonicTimestampGenerator.buildClock(MonotonicTimestampGenerator.java:109)
    at com.datastax.oss.driver.internal.core.time.MonotonicTimestampGenerator.<init>(MonotonicTimestampGenerator.java:43)
    at com.datastax.oss.driver.internal.core.time.AtomicTimestampGenerator.<init>(AtomicTimestampGenerator.java:52)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:229)
    ... 16 more
adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @adutra on Mon, 20 Jan 2020 19:00:39 GMT.


Thank you @phact ! From the logs above we see:

INFO [co.da.os.dr.in.co.ti.Clock] (s0-admin-0) Using native clock for microsecond precision

That’s weird because it shows evidence that the driver could call gettimeofday once when loading the Native class, but a subsequent call to that syscall fails with a linkage error.

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @absurdfarce on Fri, 13 Mar 2020 16:45:13 GMT.


Just recording observations here, mainly so that I don’t forget them

Underlying cause here looks to be an inability to find the native jffi libs in order to execute the JNI call. Some of the proxies generated by jnr-ffi will also chuck ULE in some cases but by the time we get to the code in the stack trace above we’re already past that point.

The named function is certainly in the underlying JAR, however, as one might expect:

[mersault@localhost tmp]$ cp /work/maven/repo/com/github/jnr/jffi/1.2.19/jffi-1.2.19-native.jar .
[mersault@localhost tmp]$ jar xf jffi-1.2.19-native.jar
[mersault@localhost tmp]$ nm jni/x86_64-Linux/libjffi-1.2.so | grep invokeN2O1
00000000000086f0 T Java_com_kenai_jffi_Foreign_invokeN2O1

So this sure looks to be an outright linkage problem.

It’s worth noting that we don’t see these issues with a basic app running in Graal. https://github.com/absurdfarce/graal-cassandra-driver includes just such an application which loads the jffi native code without issue. So this certainly seems to be something in Quarkus.

The native JARs weren’t originally included as resources in the native image, but even after I add those in and toss in a few other fixes from the discussion in https://datastax-oss.atlassian.net/browse/JAVA-2663 the problem is still there. I even tried a custom driver build with the libc fix mentioned in https://datastax-oss.atlassian.net/browse/JAVA-2663 and it didn’t make any difference, which was expected (since the libc version used shouldn’t impact underling linkage of native code) but reasonable to rule out. https://github.com/absurdfarce/cassandra-quarkus-starter contains the results of that investigation to date.

At this point i don’t have explanations to the following two points:

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @adutra on Sat, 14 Mar 2020 20:06:11 GMT.


@absurdfarce thanks for this thorough investigation! Wrt to the initial problem:

Native.isCurrentTimeMicrosAvailable() is returning true here, but the native call isn't available.

If I understand you correctly, we still don’t know what’s happening, do we?

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @absurdfarce on Mon, 16 Mar 2020 04:54:34 GMT.


@adutra Yes, that’s correct.

I believe when I ran this via a debugger I wasn’t seeing the expected deployment of the native lib into the tmp directory. Such a failure would at least explain the current exception, although it just pushes the problem back a step (i.e. why isn’t the lib deploying in that case?) I need to re-test that to confirm, though… that’s all from memory and it’s possible I’ve confused that case with something else I saw.

Even if that is correct, it doesn’t explain why the initial native call succeeds and returns a value of 0 while subsequent calls result in an exception. The static initializers in Native use exactly that test to determine whether gettimeofday() is available; only thing I can figure is that the generated proxy is (fraudulently) returning zero on the initial call as some kind of setup operation and then trying the native linkage from that point forward. But that’s still just speculation; I haven’t been able to nail down the root cause yet.

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @tomekl007 on Mon, 30 Mar 2020 12:56:43 GMT.


Currently, I’ve implemented native support (quarkus, graal) by setting:TIMESTAMP_GENERATOR_FORCE_JAVA_CLOCK:

https://github.com/datastax/cassandra-quarkus/pull/1/files#diff-c8fa2affa6aa6a83bbe9b43abebe85e6R67

and using @Substitute:

https://github.com/datastax/cassandra-quarkus/pull/1/files#diff-d549fb86e4c13de9dbd13abb41da510bR19

to change the isGetProcessIdAvailable to return always false - due to that fact, the JNR code is never called in the Native setup.
Using this approach, I’ve managed to run Native after building integration-tests module:

mvn clean package -Pnative-image -DskipTests

And tests that are using @NativeImageTest are passing:

I found only one extension that is using native calls, which is Kafka streams for underlying rocksdb:
https://github.com/quarkusio/quarkus/blob/master/extensions/kafka-streams/deployment/src/main/java/io/quarkus/kafka/streams/deployment/KafkaStreamsProcessor.java#L113

They are registering this as JniRuntimeAccessBuildItem.

I think that we should continue to work on this ticket as a followup after the first quarkus PR:

https://github.com/datastax/cassandra-quarkus/pull/1/

and try to enable JNR support there (removing substitutions, register JNI).
We should use existing @NativeImageTest as a baseline to verify if the native is working.

Here is a conversation that may be helpful as well:
https://github.com/oracle/graal/issues/675

adutra commented 4 years ago

Imported from JAVA-2616, originally commented by @absurdfarce on Mon, 13 Apr 2020 16:20:38 GMT.


https://datastax-oss.atlassian.net/browse/JAVA-2663 will provide a more complete solution to this issue so there’s not really much else to do with this ticket.

adutra commented 4 years ago

Link imported from Jira: this issue (JAVA-2616) --- is resolved by ---> JAVA-2663 (issue #54).

adutra commented 4 years ago

Link imported from Jira: this issue (JAVA-2616) --- resolves ---> JAVA-2663 (issue #54).

adutra commented 4 years ago

Link imported from Jira: this issue (JAVA-2616) --- relates to ---> JAVA-2287 (issue #56).