apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

Failures in RealtimeQuickStart when indexes are changed #10675

Closed gortiz closed 1 year ago

gortiz commented 1 year ago

RealtimeQuickStart seems to fail when indexes are changed (either created, updated or removed).

In order to reproduce it:

The behavior is not 100% consistent. Sometimes it is needed to retry.

I've been abled to replicate this with both master(866c796bd56cf846b654f29f024f3e610557b2c7) and with release release-0.12.1. When done in master, the following log is printed:

java.lang.IllegalStateException: Failed to find table config for table: githubEvents_OFFLINE
    at com.google.common.base.Preconditions.checkState(Preconditions.java:518) ~[guava-20.0.jar:?]
    at org.apache.pinot.plugin.minion.tasks.BaseTaskExecutor.getTableConfig(BaseTaskExecutor.java:51) ~[classes/:?]
    at org.apache.pinot.plugin.minion.tasks.realtimetoofflinesegments.RealtimeToOfflineSegmentsTaskExecutor.convert(RealtimeToOfflineSegmentsTaskExecutor.java:124) ~[classes/:?]
    at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:217) ~[classes/:?]
    at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:77) ~[classes/:?]
    at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:157) [classes/:?]
    at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:118) [classes/:?]
    at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [helix-core-1.0.4.jar:1.0.4]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]

This log is not printed in 0.12.1.

Before change:

Screenshot 2023-04-24 at 09 58 59

After inverted index is removed:

Screenshot 2023-04-24 at 10 00 16

Also all older segments get deleted and new ones are created. The backup of the old ones are still indexed, while the new ones are not.

gortiz commented 1 year ago

I also tried to remove/create an inverted index in airlineStats, specifically in the Origin column, and it doesn't seem to break anything, so maybe the problem is only related to the github events table.

Jackie-Jiang commented 1 year ago

Per the exception message, I don't think it is related to the index change. The exception is thrown from RealtimeToOfflineSegmentsTask, which expects both REALTIME and OFFLINE table to exist. We should just remove this task from the table config

shounakmk219 commented 1 year ago

I had a look at this, from the description it looks like there are 2 issues

  1. One which @Jackie-Jiang pointed out that the exception log is due to the RealtimeToOfflineSegmentsTask task which is unable to find the respective offline table.
  1. The one which @gortiz is pointing to where segments being deleted and index are acting weird upon index update + segment reload.
gortiz commented 1 year ago

The data in githubEvents table is from 2021 and the table config has retention set to 1 year so reload is deleting all the segments

Very good catch! That would explain why we only see this behavior in this test and not in production and honestly it will make me sleep better :)