facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.37k stars 6.29k forks source link

SIGSEGV in BlockBasedTable DumpDataBlocks (Kafka Streams) #11065

Open gharris1727 opened 1 year ago

gharris1727 commented 1 year ago

A project that I'm working on (Kafka Streams) uses RocksDB via JNI. While running tests, I encountered a SIGSEGV. Downstream issue: https://issues.apache.org/jira/browse/KAFKA-14555

Expected behavior

No SIGSEGV

Actual behavior

JVM Crash with the following stacktrace:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000001269f2f2c, pid=88913, tid=40199
#
# JRE version: OpenJDK Runtime Environment Corretto-17.0.4.9.1 (17.0.4.1+9) (build 17.0.4.1+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.4.9.1 (17.0.4.1+9-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, parallel gc, bsd-aarch64)
# Problematic frame:
# C  [librocksdbjni15989196819046251041.jnilib+0x2def2c]  _ZN7rocksdb15BlockBasedTable14DumpDataBlocksERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+0x1650

---------------  T H R E A D  ---------------
Current thread is native 
threadStack: [0x0000000171704000,0x0000000171787000],  sp=0x0000000171784e90,  free space=515k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [librocksdbjni15989196819046251041.jnilib+0x2def2c]  _ZN7rocksdb15BlockBasedTable14DumpDataBlocksERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+0x1650
C  [librocksdbjni15989196819046251041.jnilib+0x2cff98]  _ZN7rocksdb15BlockBasedTable28PrefetchIndexAndFilterBlocksERKNS_11ReadOptionsEPNS_18FilePrefetchBufferEPNS_20InternalIteratorBaseINS_5SliceEEEPS0_bRKNS_22BlockBasedTableOptionsEimmPNS_23BlockCacheLookupContextE+0x354
C  [librocksdbjni15989196819046251041.jnilib+0x2ce4d4]  _ZN7rocksdb15BlockBasedTable4OpenERKNS_11ReadOptionsERKNS_16ImmutableOptionsERKNS_10EnvOptionsERKNS_22BlockBasedTableOptionsERKNS_21InternalKeyComparatorEONSt3__110unique_ptrINS_22RandomAccessFileReaderENSG_14default_deleteISI_EEEEyPNSH_INS_11TableReaderENSJ_ISN_EEEERKNSG_10shared_ptrIKNS_14SliceTransformEEEbbibybPNS_17TailPrefetchStatsEPNS_16BlockCacheTracerEmRKNSG_12basic_stringIcNSG_11char_traitsIcEENSG_9allocatorIcEEEEy+0xaa0
C  [librocksdbjni15989196819046251041.jnilib+0x2bbd04]  _ZNK7rocksdb22BlockBasedTableFactory14NewTableReaderERKNS_11ReadOptionsERKNS_18TableReaderOptionsEONSt3__110unique_ptrINS_22RandomAccessFileReaderENS7_14default_deleteIS9_EEEEyPNS8_INS_11TableReaderENSA_ISE_EEEEb+0x8c
C  [librocksdbjni15989196819046251041.jnilib+0x18de18]  _ZN7rocksdb10TableCache14GetTableReaderERKNS_11ReadOptionsERKNS_11FileOptionsERKNS_21InternalKeyComparatorERKNS_14FileDescriptorEbbPNS_13HistogramImplEPNSt3__110unique_ptrINS_11TableReaderENSF_14default_deleteISH_EEEERKNSF_10shared_ptrIKNS_14SliceTransformEEEbibmNS_11TemperatureE+0x418
C  [librocksdbjni15989196819046251041.jnilib+0x18e5e8]  _ZN7rocksdb10TableCache9FindTableERKNS_11ReadOptionsERKNS_11FileOptionsERKNS_21InternalKeyComparatorERKNS_14FileDescriptorEPPNS_5Cache6HandleERKNSt3__110shared_ptrIKNS_14SliceTransformEEEbbPNS_13HistogramImplEbibmNS_11TemperatureE+0x22c
C  [librocksdbjni15989196819046251041.jnilib+0x18e96c]  _ZN7rocksdb10TableCache11NewIteratorERKNS_11ReadOptionsERKNS_11FileOptionsERKNS_21InternalKeyComparatorERKNS_12FileMetaDataEPNS_18RangeDelAggregatorERKNSt3__110shared_ptrIKNS_14SliceTransformEEEPPNS_11TableReaderEPNS_13HistogramImplENS_17TableReaderCallerEPNS_5ArenaEbimPKNS_11InternalKeyESW_b+0x1ac
C  [librocksdbjni15989196819046251041.jnilib+0x8fdc8]  _ZN7rocksdb13CompactionJob25ProcessKeyValueCompactionEPNS0_18SubcompactionStateE+0x1be4
C  [librocksdbjni15989196819046251041.jnilib+0x8d92c]  _ZN7rocksdb13CompactionJob3RunEv+0xed8
C  [librocksdbjni15989196819046251041.jnilib+0xfb318]  _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE+0xbc8
C  [librocksdbjni15989196819046251041.jnilib+0xf9484]  _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE+0xc0
C  [librocksdbjni15989196819046251041.jnilib+0xf6f58]  _ZN7rocksdb6DBImpl16BGWorkCompactionEPv+0x30
C  [librocksdbjni15989196819046251041.jnilib+0x3561dc]  _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm+0x1ec
C  [librocksdbjni15989196819046251041.jnilib+0x35645c]  _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv+0x7c
C  [librocksdbjni15989196819046251041.jnilib+0x357ed8]  _ZN7rocksdb13NewThreadPoolEi+0x2b0
C  [libsystem_pthread.dylib+0x726c]  _pthread_start+0x94

Full log: hs_err_pid88913.log

Steps to reproduce the behavior

  1. Clone Kafka https://github.com/apache/kafka from trunk
  2. Run ./gradlew streams:cleanTest streams:test

Note: I have only seen this failure once so far and have not yet verified these reproduction steps. I am using macOS Monterey 12.6 with an arm64/aarch64 Apple Silicon M1 Max.

alanpaxton commented 1 year ago

As it's tagged "Java" I've had a look at this, but not got far. I wasn't able to repro on an M1 Max running Ventura 13.1.

Looking at the stack trace, it appears to me that it is crashing while dumping the SST file, while running compaction ? But I can't see where the dump is initiated and I wonder if it is either (a) being run manually (by the Kafka scripts ?) or (b) part of a crash dump hook.

Whatever the core issue, it is a RocksDB core failure I think, rather than the Java API passing something wrong in. @akankshamahajan15 could you or someone else on the core team with knowledge of compaction see if you can make more sense of it ?

jbonofre commented 1 year ago

I had a similar issue on a project (not exactly the same trace though).

The SIGSEGV in case was due to the column family handle. To avoid it, I had to "force" the loading of the descriptor to be sure it's populate:

ColumnFamilyHandle handle = ...;
handle.getDescriptor();

I hope it helps.