apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.31k stars 1.24k forks source link

SIGSEGV error when creating inverted index in MV column from large parquet files #12286

Open dragondgold opened 6 months ago

dragondgold commented 6 months ago

When creating and inverted index in a large MV column (3000 integer values on average) from a parquet file with many rows (2 million rows) I get a SIGSEV error:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4d7fd1c9d8, pid=671, tid=808
#
# JRE version: OpenJDK Runtime Environment Corretto-11.0.20.9.1 (11.0.20.1+9) (build 11.0.20.1+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-11.0.20.9.1 (11.0.20.1+9-LTS, mixed mode, tiered, serial gc, linux-amd64)
# Problematic frame:
# J 10100 c2 org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(Lorg/apache/pinot/spi/data/readers/GenericRow;)V (228 bytes) @ 0x00007f4d7fd1c9d8 [0x00007f4d7fd1b580+0x0000000000001458]
#
# Core dump will be written. Default location: /opt/pinot/core.671
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-11/issues/
#

---------------  S U M M A R Y ------------

Command Line: -Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx110G -XX:+UseSerialGC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log -Dplugins.dir=/opt/pinot/plugins -Dapp.name=pinot-admin -Dapp.pid=671 -Dapp.repo=/opt/pinot/lib -Dapp.home=/opt/pinot -Dbasedir=/opt/pinot -Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Dlog4j2.configurationFile=/opt/pinot/etc/conf/pinot-controller-log4j2.xml org.apache.pinot.tools.admin.PinotAdministrator LaunchDataIngestionJob -jobSpecFile /opt/data/job.yml

Host: AMD EPYC 7R13 Processor, 32 cores, 123G, Amazon Linux release 2 (Karoo)
Time: Fri Dec 22 15:56:34 2023 UTC elapsed time: 1015.369777 seconds (0d 0h 16m 55s)

---------------  T H R E A D  ---------------

Current thread (0x00007f4d90241000):  JavaThread "pool-3-thread-1" [_thread_in_Java, id=808, stack(0x00007f31da549000,0x00007f31da64a000)]

Stack: [0x00007f31da549000,0x00007f31da64a000],  sp=0x00007f31da6484f0,  free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 10100 c2 org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(Lorg/apache/pinot/spi/data/readers/GenericRow;)V (228 bytes) @ 0x00007f4d7fd1c9d8 [0x00007f4d7fd1b580+0x0000000000001458]
J 10106% c2 org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build()V (376 bytes) @ 0x00007f4d7fd247a0 [0x00007f4d7fd246a0+0x0000000000000100]
j  org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run()Ljava/lang/String;+228
j  org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(Lorg/apache/pinot/spi/ingestion/batch/spec/SegmentGenerationTaskSpec;Ljava/io/File;Ljava/net/URI;Ljava/io/File;)V+18
j  org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner$$Lambda$780.run()V+20
j  java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4 java.base@11.0.20.1
j  java.util.concurrent.FutureTask.run()V+39 java.base@11.0.20.1
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@11.0.20.1
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.20.1
j  java.lang.Thread.run()V+11 java.base@11.0.20.1
v  ~StubRoutines::call_stub
V  [libjvm.so+0x8e13bb]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x39b
V  [libjvm.so+0x8df37d]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V  [libjvm.so+0x98ae7c]  thread_entry(JavaThread*, Thread*)+0x6c
V  [libjvm.so+0xedf730]  JavaThread::run()+0x280
V  [libjvm.so+0xedc0ff]  Thread::call_run()+0x14f
V  [libjvm.so+0xc78ea6]  thread_native_entry(Thread*)+0xe6

siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f31602ff000

Register to memory mapping:

RAX=0x0 is NULL
RBX=0x00007f31602ff000 is an unknown value
RCX=0x00007f3c21365d58 is an oop: java.util.HashMap 
{0x00007f3c21365d58} - klass: 'java/util/HashMap'
 - ---- fields (total size 8 words):
 - transient strict 'keySet' 'Ljava/util/Set;' @16  NULL (0 0)
 - transient strict 'values' 'Ljava/util/Collection;' @24  a 'java/util/HashMap$Values'{0x00007f3c21370760} (21370760 7f3c)
 - transient 'size' 'I' @32  1
 - transient 'modCount' 'I' @36  1
 - 'threshold' 'I' @40  12 (c)
 - final 'loadFactor' 'F' @44  0.750000 (3f400000)
 - transient strict 'table' '[Ljava/util/HashMap$Node;' @48  a 'java/util/HashMap$Node'[16] {0x00007f3c21370778} (21370778 7f3c)
 - transient strict 'entrySet' 'Ljava/util/Set;' @56  NULL (0 0)
RDX=0x00007f31602ff000 is an unknown value
RSP=0x00007f31da6484f0 is pointing into the stack for thread: 0x00007f4d90241000
RBP=0x00007f319473b358 is a pointer to class: 
xerial.larray.mmap.MMapBuffer {0x00007f319473b358}
 - instance size:     9
 - klass size:        122
 - access:            public synchronized 
 - state:             fully_initialized
 - name:              'xerial/larray/mmap/MMapBuffer'
 - super:             'xerial/larray/buffer/LBufferAPI'
 - sub:               
 - arrays:            NULL
 - methods:           Array<T>(0x00007f319473ad68)
 - method ordering:   Array<T>(0x00007f31e0155018)
 - default_methods:   Array<T>(0x0000000000000000)
 - local interfaces:  Array<T>(0x00007f31e0155060)
 - trans. interfaces: Array<T>(0x00007f31e0155060)
 - constants:         constant pool [227] {0x00007f319473a500} for 'xerial/larray/mmap/MMapBuffer' cache=0x00007f319473d190
 - class loader data:  loader data: 0x00007f4d90101eb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00007f3b223090f8}
 - host class:        NULL
 - source file:       'MMapBuffer.java'
 - class annotations:       Array<T>(0x0000000000000000)
 - class type annotations:  Array<T>(0x0000000000000000)
 - field annotations:       Array<T>(0x0000000000000000)
 - field type annotations:  Array<T>(0x0000000000000000)
 - inner classes:     Array<T>(0x00007f31e0155030)
 - nest members:     Array<T>(0x00007f31e0155030)
 - java mirror:       a 'java/lang/Class'{0x00007f3ba37080b8} = 'xerial/larray/mmap/MMapBuffer'
 - vtable length      60  (start addr: 0x00007f319473b528)
 - itable length      2 (start addr: 0x00007f319473b708)
 - ---- static fields (0 words):
 - ---- non-static fields (7 words):
 - public 'm' 'Lxerial/larray/buffer/Memory;' @16 
 - private final 'fd' 'J' @24 
 - private final 'address' 'J' @32

Reducing the parquet file from 2M rows to 1.2M rows results in an index out of bounds error instead of SIGSEV. Reducing the row count event more to 700k rows works as expected.

My guess, when number_of_rows * MV_column_length is slightly over Integer.MAX_VALUE I get an index out of bounds error, when it goes over Integer.MAX_VALUE for a lot (i don't know how much exactly) I get a SIGSEV error, so I think the issue is when using an inverted inde and number_of_rows * MV_column_length > Integer.MAX_VALUE , probably because a 32-bit roaring bitmap is being used?

ksnijjer commented 5 months ago

@snleee ^

gortiz commented 5 months ago

This kind of problems should not produce a SIGSEV. I think this may be related to using LArray buffers when the index is larger than 2GBs.

One of the issues of LArray is that it doesn't check memory offsets and that may produce SIGSEVs. The other issue of LArrays is that it doesn't work in Java > 15. Therefore we created our own library to be able to run in modern Java versions.

We can tests whether the issue is fired by LArray by changing the library used. This is not going to fix the issue, but it is not going to kill the process in case it happens. Could you run the same job on a Pinot cluster using Java 17 or 21? Alternatively our library can be used in Java 11 by changing the value of pinot server property pinot.offheap.buffer.factory.

pinot.offheap.buffer.factory = org.apache.pinot.segment.spi.memory.unsafe.UnsafePinotBufferFactory