duckdb / duckdb-java

DuckDB JDBC Driver
https://duckdb.org/docs/api/java.html
MIT License
35 stars 21 forks source link

[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14

Open loicmathieu opened 11 months ago

loicmathieu commented 11 months ago

What happens?

Since version 0.9.0, using the DuckDB JDBC driver in a Java application makes the application crash with a SIGSEGV. The Java version is 17.0.5 (tested also on 17.0.8.1).

There is first a Java exception

java.sql.SQLException: random_device could not be read
    at org.duckdb.DuckDBNative.duckdb_jdbc_startup(Native Method)
    at org.duckdb.DuckDBConnection.newConnection(DuckDBConnection.java:48)
    at org.duckdb.DuckDBDriver.connect(DuckDBDriver.java:38)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
    at io.kestra.plugin.jdbc.JdbcConnectionInterface.connection(JdbcConnectionInterface.java:63)
    at io.kestra.plugin.jdbc.AbstractJdbcQuery.run(AbstractJdbcQuery.java:77)
    at io.kestra.plugin.jdbc.duckdb.Query.run(Query.java:148)
    at io.kestra.plugin.jdbc.duckdb.Query.run(Query.java:31)
    at io.kestra.core.runners.Worker$WorkerThread.run(Worker.java:674)  

Then a JVM crash

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f523603fd60, pid=37746, tid=39346
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.5+8 (17.0.5+8) (build 17.0.5+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (17.0.5+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  0x00007f523603fd60
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to <redacted>)
#
# An error report file with more information is saved as:
# <redacted>
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

It works well with 0.8.0.

To Reproduce

Here is the SQL query:

      INSTALL httpfs;
      SELECT Title, max("Days In Top 10") 
      from (SELECT * FROM read_parquet('s3://duckdb-md-dataset-121/netflix_daily_top_10.parquet'))
      where Type='Movie'
      GROUP BY Title
      ORDER BY max("Days In Top 10") desc
      limit 5;

The code uses the standard Java JDBC API (Connection & Statement), but it is not easily extracted as it runs via Kestra DuckDB plugin.

OS:

Ubuntu 23.04

DuckDB Version:

0.9.0

DuckDB Client:

Java JDBC

Full Name:

Loïc Mathieu

Affiliation:

Kestra

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

loicmathieu commented 11 months ago

See the corresponding core dump log: hs_err_pid37746.log

carlopi commented 11 months ago

Could you try with v0.9.1 that should contain some potentially relevant fixes?

loicmathieu commented 11 months ago

Already tried and we have the exact same issue (the stacktrace and core dump are from 0.9.1, I downgrade to 0.9.0 with the same issue)

carlopi commented 11 months ago

The fix in the httpfs extension has gone live a couple of hours ago, could you potentially give another try (while performing before hand FORCE INSTALL httfps once, as explained here: https://github.com/duckdb/duckdb/issues/9340#issuecomment-1767876982?).

This would NOT solve the random_device issue, but might solve the crash if they are independent.

loicmathieu commented 11 months ago

Even with FORCE INSTALL httpfs I have the same issue.

By the way, the random_device issue didn't appears on 0.8.0 so it may not be the same issue as duckdb/duckdb#9340

Mause commented 11 months ago

Can you try the solution mentioned in https://github.com/duckdb/duckdb/issues/8708#issuecomment-1714549999 ?

loicmathieu commented 11 months ago

@Mause the solution by setting LD_PRELOAD works, however, this is not a proper fix for us as we cannot control the environment in which our user will run our code.

Mause commented 11 months ago

It's less a permanent solution as it is confirming it's the same issue, and one we've seen before (though that was with tensorflow and python)

Mause commented 11 months ago

For our own reference, are you using any other java libraries that are backed by a C++ library?

Mause commented 11 months ago

I notice /tmp/librocksdbjni14687608028396635175.so is mentioned in the dump, do you see the issue if you exclude that library/don't load it before duckdb?

loicmathieu commented 11 months ago

For our own reference, are you using any other java libraries that are backed by a C++ library?

It is very difficult to answer this question as this kind of information is usually not documented. We are using literally hundreds of libraries (maybe even more than a thousand as we have 400 plugins).

Our runtime uses Netty which, for sure, uses native libraries.

loicmathieu commented 11 months ago

I notice /tmp/librocksdbjni14687608028396635175.so is mentioned in the dump, do you see the issue if you exclude that library/don't load it before duckdb?

Oh!, this can explain why we're only seeing this when using our Kafka runner and not our JDBC runner (we can launch Kestra with two different runners). So yes I confirm this works when we don't use Kafka (so no use of rocksdb).

Mause commented 11 months ago

For our future reference, this is enough to trigger the crash: https://github.com/Mause/duckdb_rocksdb_crash/blob/main/src/test/java/com/mycompany/app/AppTest.java

Mause commented 11 months ago

Or a crash anyway, not certain it's the same one

loicmathieu commented 9 months ago

Hi, Do you have any news on this? It prevent us to upgrade to driver version 0.9.2 so it prevent us to use MotherDuck as MotherDuck only supports DuckDB 0.9.2!

elefeint commented 9 months ago

I wonder if this issue manifested itself in 0.9.x as a side effect of the rease build moving to manylinux.

I've built a local version of DuckDB JDBC driver with the codebase as of v0.9.2 tag using Ubuntu 22.04, and @Mause 's reproducer from https://github.com/duckdb/duckdb-java/issues/14 no longer crashes. (different JVMs also behave differently, with Ubuntu build of OpenJDK not crashing even with the released version of JDBC driver, but that's likely due to different library loading order).

@Y-- helped me look at the difference between the two drivers, and it seems the manylinux-built driver contains two extra libraries that ubuntu-built driver does not -- libdl.so.2 and libpthread.so.0:

/tmp/official> ldd libduckdb_java.so_linux_amd64
    linux-vdso.so.1 (0x00007ffc2e9e0000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7a39e39000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7a39e34000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7a37200000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7a37519000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7a39e14000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7a36e00000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f7a39e52000)

/tmp/mine> ldd libduckdb_java.so_linux_amd64
    linux-vdso.so.1 (0x00007ffcbcb7f000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe37ac00000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe3802f0000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe3802d0000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe37a800000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fe3803eb000)
tchiotludo commented 7 months ago

any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version.

Mause commented 7 months ago

any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version.

Does the LD_PRELOAD workaround fix it for you as well?

loicmathieu commented 7 months ago

@Mause yes it works, but as I said, we cannot control the environment of our users so it's not a solution.

armetiz commented 4 months ago

I'm a Kestra user,

Do you think that problem will be fixed in DuckDB next releases or do I have to deal with LD_PRELOAD solution?

armetiz commented 3 months ago

I tried to reproduce the error to help resolve this issue, but It works.

I create a Dockerfile using latest version of eclipse-temurin. Create a Java application that fetch a remote Parquet.

Display the SQL results : ✅

Both 0.9.0 and 0.10.3 are working.

Here the gist with all used files : https://gist.github.com/armetiz/e4ffd81189eb334c5acdf3e9e9796940

Can I try something else to reproduce the problem and hope for a solution?

Regards


outputs

➜  duckdb-jdbc docker build -t helloworld .
➜  duckdb-jdbc docker run helloworld:latest
DuckDB - About SIGSEGV
01001_1, 1001_1, 01001, 1, bureau 1,  , Salle des fêtes, 01400, abergement clemenciat, 448, 448.0, 01001_0001, 
01002_1, 1002_1, 01002, 1, mairie, 1, Place de la Mairie, 01640, l abergement de varey, 157, 143.0, 01002_0001, 
01004_1, 1004_1, 01004, 1, b1 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 633, 630.0, 01004_0001, 
01004_2, 1004_2, 01004, 2, b2 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 640, 638.0, 01004_0002, 
01004_3, 1004_3, 01004, 3, b3 chateau des echelles,  , RUE DES ARENES, 01500, amberieu en bugey, 736, 730.0, 01004_0003, 
01004_4, 1004_4, 01004, 4, b4 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 532, 527.0, 01004_0004, 
01004_5, 1004_5, 01004, 5, b5 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 531, 529.0, 01004_0005, 
01004_6, 1004_6, 01004, 6, b6 groupe scolaire jules ferry,  , RUE VICTOR HUGO, 01500, amberieu en bugey, 628, 627.0, 01004_0006, 
01004_7, 1004_7, 01004, 7, b7 ecole maternelle de tiret,  , RUE JACQUES PREVERT, 01500, amberieu en bugey, 582, 577.0, 01004_0007, 
01004_8, 1004_8, 01004, 8, b8 ecole maternelle de tiret,  , RUE JACQUES PREVERT, 01500, amberieu en bugey, 691, 688.0, 01004_0008, 
loicmathieu commented 3 months ago

@armetiz on Kestra, this issue only occurs if the rocksdb native library is loaded before the duckdb native library, this happens in Kestra EE.

armetiz commented 3 months ago

Hi @Mause I tried to reproduce your Maven configuration within a Docker container.

But as you can see, I could not reproduce the error : https://github.com/armetiz/dockerfile-maven-duckdb-rockdbs

elefeint commented 1 month ago

The reproduction of this requires building DuckDB on manylinux2014 but running the Java application on a modern system. Docker file reproducing the issue with a debug version of DuckDB: Dockerfile.txt

Output:

0.256 *** BEFORE LOADING ROCKSDB ***                                                                                                                          
0.363 *** AFTER LOADING ROCKSDB ***                                                                                                                           
2.095 #                                                                                                                                                       
2.095 # A fatal error has been detected by the Java Runtime Environment:                                                                                      
2.095 #
2.095 #  SIGSEGV (0xb) at pc=0x00007ee3aca595cc, pid=7, tid=8
2.095 #
2.095 # JRE version: OpenJDK Runtime Environment Temurin-17.0.12+7 (17.0.12+7) (build 17.0.12+7)
2.095 # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.12+7 (17.0.12+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
2.095 # Problematic frame:
2.095 # C  [libduckdb_java10883698283861250744.so+0x2cec5cc]  duckdb::Vector::GetVectorType() const+0xc
2.096 #
2.096 # Core dump will be written. Default location: //core.7
2.096 #
2.096 # An error report file with more information is saved as:
2.096 # //hs_err_pid7.log
2.249 #
2.249 # If you would like to submit a bug report, please visit:
2.249 #   https://github.com/adoptium/adoptium-support/issues
2.249 # The crash happened outside the Java Virtual Machine in native code.
2.249 # See problematic frame for where to report the bug.
2.249 #
2.460 Aborted (core dumped)