apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.17k stars 1.21k forks source link

bugfix: prevent background merges on the realtime lucene index #13050

Closed itschrispeck closed 2 weeks ago

itschrispeck commented 2 weeks ago

https://github.com/apache/pinot/pull/12744 changed the Lucene index build process to commit the realtime Lucene index, and reuse it in the immutable segment.

.commit() is called at the beginning of segment conversion, and the Lucene index directory is copied to another location. Since IndexWriter can merge segments in the background even without insertions/updates to the index, these merges non-deterministically can trigger a FileNotFoundException:

java.lang.RuntimeException: Caught exception while instantiating the LuceneTextIndexCreator for column: __mergedTextIndex 
     at org.apache.pinot.segment.local.segment.creator.impl.text.LuceneTextIndexCreator.<init>(LuceneTextIndexCreator.java:149) 
<...>
Caused by: java.lang.RuntimeException: Failed to copy the mutable lucene index: java.io.FileNotFoundException: File system element for parameter 'source' does not exist: '/.../dataDir/table_REALTIME/consumers/table__0__3170__20240430T2313Z/col.lucene.v9.index/_3r.si' 
     at org.apache.pinot.segment.local.segment.creator.impl.text.LuceneTextIndexCreator.convertMutableSegment(LuceneTextIndexCreator.java:192) ...

By using NoMergeScheduler, the async merges will never occur. This patch was tested in our staging/prod env and the intermittent segment build failures disappeared.

tag: bugfix

codecov-commenter commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 62.17%. Comparing base (59551e4) to head (ef290b9). Report is 396 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #13050 +/- ## ============================================ + Coverage 61.75% 62.17% +0.41% + Complexity 207 198 -9 ============================================ Files 2436 2502 +66 Lines 133233 136658 +3425 Branches 20636 21179 +543 ============================================ + Hits 82274 84961 +2687 - Misses 44911 45397 +486 - Partials 6048 6300 +252 ``` | [Flag](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: | | [integration](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: | | [integration1](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-0.01%)` | :arrow_down: | | [integration2](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | | | [java-11](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.13% <100.00%> (+0.42%)` | :arrow_up: | | [java-21](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.05% <100.00%> (+0.42%)` | :arrow_up: | | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.15% <100.00%> (+0.40%)` | :arrow_up: | | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.00% <100.00%> (+34.27%)` | :arrow_up: | | [temurin](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.17% <100.00%> (+0.41%)` | :arrow_up: | | [unittests](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `62.16% <100.00%> (+0.41%)` | :arrow_up: | | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.69% <0.00%> (-0.20%)` | :arrow_down: | | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/13050/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.96% <100.00%> (+0.22%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jackie-Jiang commented 2 weeks ago

I saw this test being flaky. Do you think it is related?

Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.580 s <<< FAILURE! -- in org.apache.pinot.segment.local.realtime.impl.invertedindex.LuceneMutableTextIndexTest
Error:  org.apache.pinot.segment.local.realtime.impl.invertedindex.LuceneMutableTextIndexTest.testQueries -- Time elapsed: 0.289 s
Error:  org.apache.pinot.segment.local.realtime.impl.invertedindex.LuceneMutableTextIndexTest.testQueryCancellationIsSuccessful -- Time elapsed: 0.032 s <<< FAILURE!
org.testng.TestException: 

The exception was thrown with the wrong message: expected ".*TEXT_MATCH query timeout on realtime consuming segment.*" but got "java.util.concurrent.RejectedExecutionException: java.lang.InterruptedException"
    at org.testng.internal.invokers.ExpectedExceptionsHolder.wrongException(ExpectedExceptionsHolder.java:69)
    at org.testng.internal.invokers.TestInvoker.considerExceptionsInternal(TestInvoker.java:885)
    at org.testng.internal.invokers.TestInvoker.considerExceptions(TestInvoker.java:860)
    at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:741)
    at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:230)
    at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:63)
    at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:992)
    at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:203)
    at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:154)
    at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:134)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
    at org.testng.TestRunner.privateRun(TestRunner.java:739)
    at org.testng.TestRunner.run(TestRunner.java:614)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:421)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:413)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:373)
    at org.testng.SuiteRunner.run(SuiteRunner.java:312)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1274)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1208)
    at org.testng.TestNG.runSuites(TestNG.java:1112)
    at org.testng.TestNG.run(TestNG.java:1079)
    at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:155)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:102)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:91)
    at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:137)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
    at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
    at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.RejectedExecutionException: java.lang.InterruptedException
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at org.apache.pinot.segment.local.realtime.impl.invertedindex.LuceneMutableTextIndexTest.testQueryCancellationIsSuccessful(LuceneMutableTextIndexTest.java:105)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:141)
    at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:686)
    ... 27 more
Caused by: java.util.concurrent.RejectedExecutionException: java.lang.InterruptedException
    at org.apache.pinot.common.utils.ScalingThreadPoolExecutor$ForceQueuePolicy.rejectedExecution(ScalingThreadPoolExecutor.java:93)
    at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:841)
    at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1376)
    at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:145)
    at org.apache.pinot.segment.local.realtime.impl.invertedindex.RealtimeLuceneTextIndex.getDocIds(RealtimeLuceneTextIndex.java:149)
    at org.apache.pinot.segment.local.realtime.impl.invertedindex.LuceneMutableTextIndexTest.lambda$testQueryCancellationIsSuccessful$0(LuceneMutableTextIndexTest.java:103)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.InterruptedException
    at java.base/java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly(ReentrantLock.java:159)
    at java.base/java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:372)
    at java.base/java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:332)
    at org.apache.pinot.common.utils.ScalingThreadPoolExecutor$ForceQueuePolicy.rejectedExecution(ScalingThreadPoolExecutor.java:90)
    ... 9 more
itschrispeck commented 2 weeks ago

I don't think it's related, but is probably caused by this behavior: https://julien.ponge.org/blog/not-all-java-single-threaded-executors-are-created-equal-a-java-finalizer-horror-story/

Did you see the flakiness is Java 21? It should be fixed there, but could misbehave in the suites run with 11/17.

Do you mind if I put the fix into another PR since it's not related to the fix for the reuse feature, or want me to bundle it into this one? (there's a couple other nits I'd rather bundle together if no preference).

chenboat commented 2 weeks ago

A general question: if we turn off merge, will it result in too many Lucene segments for realtime part and cause query perf issues?

itschrispeck commented 2 weeks ago

A general question: if we turn off merge, will it result in too many Lucene segments for realtime part and cause query perf issues?

In theory it could, but in practice Pinot's typical segment sizes prevent this (>5-10GB segments isn't normal), and we don't see a query perf regression with even a small 10MB buffer size and 1.5GB segment size.

FDs are probably more of a concern for systems w/ low limits, but I think can be addressed if needed w/ lucene buffer size/compound file format/Pinot segment size tuning - all of which have existing configs.