apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.53k stars 1.29k forks source link

SIGSEGV error when creating inverted index in MV column from large parquet files #12286

Open dragondgold opened 10 months ago

dragondgold commented 10 months ago

When creating and inverted index in a large MV column (3000 integer values on average) from a parquet file with many rows (2 million rows) I get a SIGSEV error:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4d7fd1c9d8, pid=671, tid=808
#
# JRE version: OpenJDK Runtime Environment Corretto-11.0.20.9.1 (11.0.20.1+9) (build 11.0.20.1+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-11.0.20.9.1 (11.0.20.1+9-LTS, mixed mode, tiered, serial gc, linux-amd64)
# Problematic frame:
# J 10100 c2 org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(Lorg/apache/pinot/spi/data/readers/GenericRow;)V (228 bytes) @ 0x00007f4d7fd1c9d8 [0x00007f4d7fd1b580+0x0000000000001458]
#
# Core dump will be written. Default location: /opt/pinot/core.671
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-11/issues/
#

---------------  S U M M A R Y ------------

Command Line: -Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx110G -XX:+UseSerialGC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log -Dplugins.dir=/opt/pinot/plugins -Dapp.name=pinot-admin -Dapp.pid=671 -Dapp.repo=/opt/pinot/lib -Dapp.home=/opt/pinot -Dbasedir=/opt/pinot -Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Dlog4j2.configurationFile=/opt/pinot/etc/conf/pinot-controller-log4j2.xml org.apache.pinot.tools.admin.PinotAdministrator LaunchDataIngestionJob -jobSpecFile /opt/data/job.yml

Host: AMD EPYC 7R13 Processor, 32 cores, 123G, Amazon Linux release 2 (Karoo)
Time: Fri Dec 22 15:56:34 2023 UTC elapsed time: 1015.369777 seconds (0d 0h 16m 55s)

---------------  T H R E A D  ---------------

Current thread (0x00007f4d90241000):  JavaThread "pool-3-thread-1" [_thread_in_Java, id=808, stack(0x00007f31da549000,0x00007f31da64a000)]

Stack: [0x00007f31da549000,0x00007f31da64a000],  sp=0x00007f31da6484f0,  free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 10100 c2 org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(Lorg/apache/pinot/spi/data/readers/GenericRow;)V (228 bytes) @ 0x00007f4d7fd1c9d8 [0x00007f4d7fd1b580+0x0000000000001458]
J 10106% c2 org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build()V (376 bytes) @ 0x00007f4d7fd247a0 [0x00007f4d7fd246a0+0x0000000000000100]
j  org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run()Ljava/lang/String;+228
j  org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(Lorg/apache/pinot/spi/ingestion/batch/spec/SegmentGenerationTaskSpec;Ljava/io/File;Ljava/net/URI;Ljava/io/File;)V+18
j  org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner$$Lambda$780.run()V+20
j  java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4 java.base@11.0.20.1
j  java.util.concurrent.FutureTask.run()V+39 java.base@11.0.20.1
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@11.0.20.1
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.20.1
j  java.lang.Thread.run()V+11 java.base@11.0.20.1
v  ~StubRoutines::call_stub
V  [libjvm.so+0x8e13bb]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x39b
V  [libjvm.so+0x8df37d]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V  [libjvm.so+0x98ae7c]  thread_entry(JavaThread*, Thread*)+0x6c
V  [libjvm.so+0xedf730]  JavaThread::run()+0x280
V  [libjvm.so+0xedc0ff]  Thread::call_run()+0x14f
V  [libjvm.so+0xc78ea6]  thread_native_entry(Thread*)+0xe6

siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f31602ff000

Register to memory mapping:

RAX=0x0 is NULL
RBX=0x00007f31602ff000 is an unknown value
RCX=0x00007f3c21365d58 is an oop: java.util.HashMap 
{0x00007f3c21365d58} - klass: 'java/util/HashMap'
 - ---- fields (total size 8 words):
 - transient strict 'keySet' 'Ljava/util/Set;' @16  NULL (0 0)
 - transient strict 'values' 'Ljava/util/Collection;' @24  a 'java/util/HashMap$Values'{0x00007f3c21370760} (21370760 7f3c)
 - transient 'size' 'I' @32  1
 - transient 'modCount' 'I' @36  1
 - 'threshold' 'I' @40  12 (c)
 - final 'loadFactor' 'F' @44  0.750000 (3f400000)
 - transient strict 'table' '[Ljava/util/HashMap$Node;' @48  a 'java/util/HashMap$Node'[16] {0x00007f3c21370778} (21370778 7f3c)
 - transient strict 'entrySet' 'Ljava/util/Set;' @56  NULL (0 0)
RDX=0x00007f31602ff000 is an unknown value
RSP=0x00007f31da6484f0 is pointing into the stack for thread: 0x00007f4d90241000
RBP=0x00007f319473b358 is a pointer to class: 
xerial.larray.mmap.MMapBuffer {0x00007f319473b358}
 - instance size:     9
 - klass size:        122
 - access:            public synchronized 
 - state:             fully_initialized
 - name:              'xerial/larray/mmap/MMapBuffer'
 - super:             'xerial/larray/buffer/LBufferAPI'
 - sub:               
 - arrays:            NULL
 - methods:           Array<T>(0x00007f319473ad68)
 - method ordering:   Array<T>(0x00007f31e0155018)
 - default_methods:   Array<T>(0x0000000000000000)
 - local interfaces:  Array<T>(0x00007f31e0155060)
 - trans. interfaces: Array<T>(0x00007f31e0155060)
 - constants:         constant pool [227] {0x00007f319473a500} for 'xerial/larray/mmap/MMapBuffer' cache=0x00007f319473d190
 - class loader data:  loader data: 0x00007f4d90101eb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00007f3b223090f8}
 - host class:        NULL
 - source file:       'MMapBuffer.java'
 - class annotations:       Array<T>(0x0000000000000000)
 - class type annotations:  Array<T>(0x0000000000000000)
 - field annotations:       Array<T>(0x0000000000000000)
 - field type annotations:  Array<T>(0x0000000000000000)
 - inner classes:     Array<T>(0x00007f31e0155030)
 - nest members:     Array<T>(0x00007f31e0155030)
 - java mirror:       a 'java/lang/Class'{0x00007f3ba37080b8} = 'xerial/larray/mmap/MMapBuffer'
 - vtable length      60  (start addr: 0x00007f319473b528)
 - itable length      2 (start addr: 0x00007f319473b708)
 - ---- static fields (0 words):
 - ---- non-static fields (7 words):
 - public 'm' 'Lxerial/larray/buffer/Memory;' @16 
 - private final 'fd' 'J' @24 
 - private final 'address' 'J' @32

Reducing the parquet file from 2M rows to 1.2M rows results in an index out of bounds error instead of SIGSEV. Reducing the row count event more to 700k rows works as expected.

My guess, when number_of_rows * MV_column_length is slightly over Integer.MAX_VALUE I get an index out of bounds error, when it goes over Integer.MAX_VALUE for a lot (i don't know how much exactly) I get a SIGSEV error, so I think the issue is when using an inverted inde and number_of_rows * MV_column_length > Integer.MAX_VALUE , probably because a 32-bit roaring bitmap is being used?

ksnijjer commented 9 months ago

@snleee ^

gortiz commented 9 months ago

This kind of problems should not produce a SIGSEV. I think this may be related to using LArray buffers when the index is larger than 2GBs.

One of the issues of LArray is that it doesn't check memory offsets and that may produce SIGSEVs. The other issue of LArrays is that it doesn't work in Java > 15. Therefore we created our own library to be able to run in modern Java versions.

We can tests whether the issue is fired by LArray by changing the library used. This is not going to fix the issue, but it is not going to kill the process in case it happens. Could you run the same job on a Pinot cluster using Java 17 or 21? Alternatively our library can be used in Java 11 by changing the value of pinot server property pinot.offheap.buffer.factory.

pinot.offheap.buffer.factory = org.apache.pinot.segment.spi.memory.unsafe.UnsafePinotBufferFactory
dd-willgan commented 3 months ago

Hey @gortiz , why would using > 2GB LArray buffers be an issue? Looking at their repo https://github.com/xerial/larray it seems like the first thing they advertise is that > 2GB buffers can be supported?

gortiz commented 2 months ago

Larray can be used to create buffers larger than >2GBs, but Larray is not maintained and not safe (see more in https://github.com/apache/pinot/issues/12810). With not safe I mean that there is no offset check when accessing memory with LArray, which means that:

  1. it can be used to execute buffer overflow attacks (something we are not used in Java)
  2. an error in an offset usually means that an illegal memory address is accessed which means that a SEGSEV is produced.

For the context I'm not 100% sure this is the actual reason that produces the specific error reported here. In fact in some very strange scenarios we have seen SEGSEV errors even when using ByteBuffers when code is compiled with C2. But in general we recommend to do not use LArray and in fact Pinot 1.2.0 does not use LArray by default.