apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

Realtime Ingestion Stalled : Consuming Segments Not converting to Completed #13626

Open Jayesh-Asrani opened 4 months ago

Jayesh-Asrani commented 4 months ago
  1. Steps to re-produce : Create a Realtime table with a sorted column id and few other Multi value columns
  2. Ingest wrong data into pinot for the MV columns where the columns are represented as strings instead of arrays

e.g. "A: "[1,2,3]"

  1. This record fails to get ingested into pinot (which is valid) due to data type mismatch
  2. However the same segment then is not able to move into a completed state. Logs below

2024/07/16 02:56:02.708 ERROR [MutableSegmentImpl_events_v11__127__157__20240715T1915Z_raw_data_analysis_poc] [events_v11__127__157__20240715T1915Z] failed to index value with inverted_index java.lang.IndexOutOfBoundsException: Index 1130 out of bounds for length 10 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100) ~[?:?] at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106) ~[?:?] at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302) ~[?:?] at java.base/java.util.Objects.checkIndex(Objects.java:385) ~[?:?] at java.base/java.util.ArrayList.get(ArrayList.java:427) ~[?:?] at org.apache.pinot.segment.local.realtime.impl.invertedindex.RealtimeInvertedIndex.add(RealtimeInvertedIndex.java:60) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at org.apache.pinot.segment.spi.index.mutable.MutableInvertedIndex.add(MutableInvertedIndex.java:30) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.addNewRow(MutableSegmentImpl.java:707) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.index(MutableSegmentImpl.java:533) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.processStreamEvents(RealtimeSegmentDataManager.java:631) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.consumeLoop(RealtimeSegmentDataManager.java:473) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager$PartitionConsumer.run(RealtimeSegmentDataManager.java:707) ~[startree-pinot-all-1.2.0-ST.40-jar-with-dependencies.jar:1.2.0-ST.40-7689a6d2a3afecbda1413a231e895717cd937513] at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] 2024/07/16 02:56:02.710 ERROR [RealtimeSegmentDataManager_events_v11__73__157__20240715T1859Z] [events_v11__73__157__20240715T1859Z] Caught exception while indexing the record at offset: 479976251 , row: {

``

mayankshriv commented 3 months ago

@xiangfu0 I think your PR https://github.com/apache/pinot/pull/13630 is for https://github.com/apache/pinot/issues/13604 and not this one.

bgl-prahlad commented 2 months ago

I tried to reproduce the issue mentioned here, but the behaviour is as expected. Ingestion fails (when ingesting bad data "data-bad.json"), and the segment remains in the CONSUMING state. But once there additional records available it does transition to ONLINE state.

See attached files used for investigation : pr_13626-config.json pr_13626-schema.json data-bad.json data-good.json data-good-2.json

bgl-prahlad commented 2 months ago

I noticed that I had not included a sortedColumn in the table config in my previous comment. I added "studentID" as the sortedColumn in the tableIndexConfig, and even in that case the segment transitions to ONLINE state once it has sufficient entries.