Open dragondgold opened 10 months ago
@snleee ^
This kind of problems should not produce a SIGSEV. I think this may be related to using LArray buffers when the index is larger than 2GBs.
One of the issues of LArray is that it doesn't check memory offsets and that may produce SIGSEVs. The other issue of LArrays is that it doesn't work in Java > 15. Therefore we created our own library to be able to run in modern Java versions.
We can tests whether the issue is fired by LArray by changing the library used. This is not going to fix the issue, but it is not going to kill the process in case it happens. Could you run the same job on a Pinot cluster using Java 17 or 21? Alternatively our library can be used in Java 11 by changing the value of pinot server property pinot.offheap.buffer.factory
.
pinot.offheap.buffer.factory = org.apache.pinot.segment.spi.memory.unsafe.UnsafePinotBufferFactory
Hey @gortiz , why would using > 2GB LArray buffers be an issue? Looking at their repo https://github.com/xerial/larray it seems like the first thing they advertise is that > 2GB buffers can be supported?
Larray can be used to create buffers larger than >2GBs, but Larray is not maintained and not safe (see more in https://github.com/apache/pinot/issues/12810). With not safe I mean that there is no offset check when accessing memory with LArray, which means that:
For the context I'm not 100% sure this is the actual reason that produces the specific error reported here. In fact in some very strange scenarios we have seen SEGSEV errors even when using ByteBuffers when code is compiled with C2. But in general we recommend to do not use LArray and in fact Pinot 1.2.0 does not use LArray by default.
When creating and inverted index in a large MV column (3000 integer values on average) from a parquet file with many rows (2 million rows) I get a
SIGSEV
error:Reducing the parquet file from 2M rows to 1.2M rows results in an index out of bounds error instead of
SIGSEV
. Reducing the row count event more to 700k rows works as expected.My guess, when
number_of_rows * MV_column_length
is slightly overInteger.MAX_VALUE
I get an index out of bounds error, when it goes overInteger.MAX_VALUE
for a lot (i don't know how much exactly) I get aSIGSEV
error, so I think the issue is when using an inverted inde andnumber_of_rows * MV_column_length > Integer.MAX_VALUE
, probably because a 32-bit roaring bitmap is being used?