elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.2k stars 24.84k forks source link

[CI] AssertionError: Accounting breaker not reset #30290

Closed DaveCTurner closed 5 years ago

DaveCTurner commented 6 years ago

CI output: https://internal-ci.elastic.co/job/elastic+x-pack-elasticsearch+6.2+multijob-windows-compatibility/682/console

REPRODUCE WITH: gradlew :x-pack-elasticsearch:qa:smoke-test-monitoring-with-watcher:integTestRunner \
  -Dtests.seed=9700B8DD0A76FCC8 \
  -Dtests.class=org.elasticsearch.smoketest.PreventFailingBuildIT \
  -Dtests.method="testSoThatTestsDoNotFail" \
  -Dtests.security.manager=true \
  -Dtests.locale=en-IE \
  -Dtests.timezone=Africa/Tripoli

Stack trace:

java.lang.AssertionError: Accounting breaker not reset to 0 on node: {node-0}{V8XMZi9FQLOkQk79I8dxvw}{raz_TS_pQISiiXGUBcPMuA}{127.0.0.1}{127.0.0.1:21360}{testattr=test}
Expected: <0L>
     but: was <2250L>
    at __randomizedtesting.SeedInfo.seed([9700B8DD0A76FCC8:3B7E65085E8C56E4]:0)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.elasticsearch.test.ExternalTestCluster.ensureEstimatedStats(ExternalTestCluster.java:171)
    at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:582)
    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2085)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:965)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
    at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
    at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at java.lang.Thread.run(Thread.java:748)

I see nothing else suspicious in the test output or in the logs on the worker. Sorry, not a lot to go on here.

elasticmachine commented 6 years ago

Pinging @elastic/es-core-infra

jpountz commented 6 years ago

This happened in a different test today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/6494/consoleFull

14:45:41   2> REPRODUCE WITH: ./gradlew :x-pack:plugin:rollup:internalClusterTest -Dtests.seed=9C130591FD311456 -Dtests.class=org.elasticsearch.xpack.rollup.RollupIT -Dtests.method="testTwoJobsStartStopDeleteOne" -Dtests.security.manager=true -Dtests.locale=el-CY -Dtests.timezone=Brazil/Acre
14:45:41 FAILURE 1.76s | RollupIT.testTwoJobsStartStopDeleteOne <<< FAILURES!
14:45:41    > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_sd3
14:45:41    > Expected: <0L>
14:45:41    >      but: was <4775L>
14:45:41    >   at __randomizedtesting.SeedInfo.seed([9C130591FD311456:BA93A46A17310921]:0)
14:45:41    >   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
14:45:41    >   at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2016)
14:45:41    >   at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
14:45:41    >   at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2047)
14:45:41    >   at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:590)
14:45:41    >   at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2163)
polyfractal commented 6 years ago

Another one on that rollup test today:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/6596/console

./gradlew :x-pack:plugin:rollup:internalClusterTest \
  -Dtests.seed=2CD38EA34BC1EAAC \
  -Dtests.class=org.elasticsearch.xpack.rollup.RollupIT \
  -Dtests.method="testTwoJobsStartStopDeleteOne" \
  -Dtests.security.manager=true \
  -Dtests.locale=sl-SI \
  -Dtests.timezone=America/Miquelon

When I get a chance I'll see if there's anything in common between the watcher test and this rollup one.

droberts195 commented 6 years ago

And again on the rollup test:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/2014/console

./gradlew :x-pack:plugin:rollup:internalClusterTest \
  -Dtests.seed=1D037D5E37846F9 \
  -Dtests.class=org.elasticsearch.xpack.rollup.RollupIT \
  -Dtests.method="testTwoJobsStartStopDeleteOne" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-CL \
  -Dtests.timezone=Pacific/Truk

The assertion is this:

16:24:57 FAILURE 7.00s | RollupIT.testTwoJobsStartStopDeleteOne <<< FAILURES!
16:24:57    > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s0
16:24:57    > Expected: <0L>
16:24:57    >      but: was <7042L>
16:24:57    >   at __randomizedtesting.SeedInfo.seed([1D037D5E37846F9:2750962E09785B8E]:0)
16:24:57    >   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
16:24:57    >   at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2023)
16:24:57    >   at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
16:24:57    >   at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2054)
16:24:57    >   at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:586)
16:24:57    >   at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2153)
16:24:57    >   at java.lang.Thread.run(Thread.java:748)
polyfractal commented 6 years ago

I looked into this a bit... no idea what's going on. The Rollup test isn't anything special (edit: and doesn not index any docs, like the earlier auditing failures were doing), and that watcher test is literally just:

public class PreventFailingBuildIT extends ESIntegTestCase {

    public void testSoThatTestsDoNotFail() {
        // Noop

        // This is required because if tests are not enable no
        // tests will be run in the entire project and all tests will fail.
    }
}

I'm likely going to delete all of RollupIT for unrelated reasons (see https://github.com/elastic/elasticsearch/issues/31258#issuecomment-397440302) but we can leave it enabled for now to debug this problem if desired, since it seems to be failing on that one Rollup test in particular. I can't reproduce locally though.

vladimirdolzhenko commented 6 years ago

same again

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/2536/console

00:51:11 FAILURE 6.42s | RollupIT.testTwoJobsStartStopDeleteOne <<< FAILURES!
00:51:11    > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s0
00:51:11    > Expected: <0L>
00:51:11    >      but: was <11421L>
00:51:11    >   at __randomizedtesting.SeedInfo.seed([698DCF0963902073:4F0D6EF289903D04]:0)
00:51:11    >   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
00:51:11    >   at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2020)
00:51:11    >   at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
00:51:11    >   at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2051)
00:51:11    >   at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:586)
00:51:11    >   at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2159)
00:51:11    >   at java.lang.Thread.run(Thread.java:748)
dnhatn commented 6 years ago

Another instance https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java10,ES_RUNTIME_JAVA=java10,nodes=virtual&&linux/158/console

16:10:08 FAILURE 4.58s J1 | ShrinkIndexIT.testShrinkIndexPrimaryTerm <<< FAILURES!
16:10:08    > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s2
16:10:08    > Expected: <0L>
16:10:08    >      but: was <1322L>
16:10:08    >   at __randomizedtesting.SeedInfo.seed([2A7F48E21D6F8FBF:F37D7B03379AF2]:0)
16:10:08    >   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
16:10:08    >   at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2020)
16:10:08    >   at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
16:10:08    >   at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2051)
16:10:08    >   at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:592)
16:10:08    >   at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2185)
16:10:08    >   at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
16:10:08    >   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:10:08    >   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
16:10:08    >   at java.base/java.lang.Thread.run(Thread.java:844)
imotov commented 6 years ago

Failed again a few times over the weekend https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=fedora/1168/console This time in different ShrinkIndexIT tests.

java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_sd2
Expected: <0L>
     but: was <1254L>
    at __randomizedtesting.SeedInfo.seed([814E471A446BC55F:ABC272835A33D012]:0)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2013)
    at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
    at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2044)
    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:592)
    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2179)
    at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:965)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
    at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
    at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at java.lang.Thread.run(Thread.java:748)
javanna commented 6 years ago

Another ShrinkIndexIT failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/2165 . I haven't muted the tests as this seems like a more general problem that can't be addressed by muting all the possibly affected tests.

danielmitterdorfer commented 6 years ago

This one is also in ShrinkIndexIT: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/1791/console

albertzaharovits commented 6 years ago

Piling another one here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java8,nodes=virtual&&linux/223/consoleFull for the BasicWatcherTests.testConditionSearchWithIndexedTemplate test. Relevant log lines:

05:52:24   1> [2018-08-07T13:52:22,004][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] after test
05:52:24 FAILURE 1.10s J1 | BasicWatcherTests.testConditionSearchWithIndexedTemplate <<< FAILURES!
05:52:24    > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s0
05:52:24    > Expected: <0L>
05:52:24    >      but: was <740L>
05:52:24    >   at __randomizedtesting.SeedInfo.seed([831554F4DAAB5B36:27A252D4EB6139B8]:0)
05:52:24    >   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
05:52:24    >   at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2012)
05:52:24    >   at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
05:52:24    >   at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2043)
05:52:24    >   at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:593)
05:52:24    >   at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2186)
05:52:24    >   at java.lang.Thread.run(Thread.java:748)

and this happened earlier (might be misleading, though):

05:52:24   1> org.elasticsearch.env.ShardLockObtainFailedException: [.watches][0]: obtaining shard lock timed out after 0ms
05:52:24   1>   at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:671) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:590) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:536) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:487) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.indices.IndicesService.deleteIndexStoreIfDeletionAllowed(IndicesService.java:722) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:709) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:615) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:284) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:218) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:492) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
05:52:24   1>   at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_181]
05:52:24   1>   at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:489) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
jasontedor commented 6 years ago

@dakrone Would you look at these build failures?

dakrone commented 6 years ago

Yes I'll look into these today

dakrone commented 6 years ago

I muted the assertion while I diagnose this, so that it doesn't occur any more for now.

s1monw commented 6 years ago

@dakrone did you find a solution for this?

dakrone commented 6 years ago

@s1monw sorry I missed your message, I have not found a solution for this yet

dakrone commented 5 years ago

latest failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+internalClusterTest/370/console

droberts195 commented 5 years ago

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/362/consoleText contains another case of this:

   > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s1, are there still Lucene indices around?
   > Expected: <0L>
   >      but: was <2508L>
   >    at __randomizedtesting.SeedInfo.seed([4AA7930E8DA3CCA6:602BA69793FBD9EB]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2458)
   >    at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:94)
   >    at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2493)
   >    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:593)
   >    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2191)
   >    at java.lang.Thread.run(Thread.java:748)
dakrone commented 5 years ago

I've opened a couple of PRs for helping to track this down, #38807 and #38903, I would appreciate anyone taking a look at them.

droberts195 commented 5 years ago

Is it the case that "Accounting breaker not reset" generally means "An index existed when we expected no indices to exist"?

38952 is obviously a case of this, but it cannot explain the early failures of this test as the .ml-annotations index was only introduced in 6.6 and this issue pre-dates that. But looking at the tests that did cause early failures of this assertion, they're all in areas of the product that might create indices or change the lifetime of indices.

If "Accounting breaker not reset" generally means "An index existed when we expected no indices to exist" then I think more people will be able to contribute to fixing this, as it probably just means that some of the test suites for parts of the product that create indices need more advanced post test cleanup code.

jpountz commented 5 years ago

Is it the case that "Accounting breaker not reset" generally means "An index existed when we expected no indices to exist"?

Memory is given back to the circuit breaker via a segment core (the immutable part of a segment, ie. everything but deletes and dv updates) closed listener. So the meaning is rather that an index was still open.

Our logic regarding closing indices is a bit subtle due to ref-counting: if there is an ongoing request while a reader is closed, then the close() call only decrements a counter and the reader is effectively closed when the ongoing search request releases the reader. Only then do close listeners run.

I wonder that this may cause race conditions if we check the memory used by the accounting breaker while the reader has been closed but close listeners have not run yet. This would explain why we are seeing this problem more in tests that have long asynchronous tasks that increment the ref count like shrinking, rollups and Watcher.

I have tried to think about how we could somehow wait for all listeners to have run, but this looks tricky. Maybe we should look at whether we still get failures if we put this assertion under an assertBusy?

dakrone commented 5 years ago

Maybe we should look at whether we still get failures if we put this assertion under an assertBusy?

I opened a PR to do that in #39211

markharwood commented 5 years ago

Build stats&_a=(columns:!(_source),filters:!(),index:e58bf320-7efd-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'stacktrace:%22Accounting%20breaker%20not%20reset%20to%200%22'),sort:!(time,desc))) tracking

hub-cap commented 5 years ago

another case, https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=sles-12/59/console

REPRODUCE WITH: ./gradlew :server:integTest \
  -Dtests.seed=382E6963024BB1D4 \
  -Dtests.class=org.elasticsearch.action.admin.indices.create.ShrinkIndexIT \
  -Dtests.method="testShrinkIndexPrimaryTerm" \
  -Dtests.security.manager=true \
  -Dtests.locale=be-BY \
  -Dtests.timezone=SystemV/AST4ADT \
  -Dcompiler.java=11 \
  -Druntime.java=8
davidkyle commented 5 years ago

SplitIndexIT failed with this error.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+internalClusterTest/2354/console

The assertBusy did not help:

java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s0, are there still Lucene indices around?
...

Expected: <0L>
     but: was <1326L>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.elasticsearch.test.InternalTestCluster.lambda$ensureEstimatedStats$31(InternalTestCluster.java:2419)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:850)
        ... 40 more
    Suppressed: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s0, are there still Lucene indices around?
Expected: <0L>
     but: was <1326L>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.elasticsearch.test.InternalTestCluster.lambda$ensureEstimatedStats$31(InternalTestCluster.java:2419)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:850)
        ... 40 more

Here's the reproduce line

./gradlew :server:integTest \
  -Dtests.seed=F93D4074E6F9AF6F \
  -Dtests.class=org.elasticsearch.action.admin.indices.create.SplitIndexIT \
  -Dtests.method="testSplitIndexPrimaryTerm" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-CR \
  -Dtests.timezone=America/Argentina/San_Juan \
  -Dcompiler.java=11 \
  -Druntime.java=8
romseygeek commented 5 years ago

Two more SplitIndexIT failures: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+internalClusterTest/2803/console https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=java8fips,nodes=immutable&&linux&&docker/82/console

droberts195 commented 5 years ago

Some more failures in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=zulu11,nodes=immutable&&linux&&docker/127/console, this time in ShrinkIndexIT:

  2> java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_sd4, are there still Lucene indices around?
    Expected: <0L>
         but: was <1394L>
hendrikmuhs commented 5 years ago

one more in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=java12,nodes=immutable&&linux&&docker/134/consoleFull for SplitIndexIT

org.elasticsearch.action.admin.indices.create.SplitIndexIT > testSplitIndexPrimaryTerm FAILED
    java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_sd3, are there still Lucene indices around?
    Expected: <0L>
         but: was <1256L>
REPRODUCE WITH: ./gradlew :server:integTest --tests "org.elasticsearch.action.admin.indices.create.SplitIndexIT.testSplitIndexPrimaryTerm" -Dtests.seed=BCF154492FFC0AA9 -Dtests.security.manager=true -Dtests.locale=sq-MK -Dtests.timezone=America/Toronto -Dcompiler.java=12 -Druntime.java=12
cbuescher commented 5 years ago

Another one today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=debian-8/369/console

dnhatn commented 5 years ago

I've opened a couple of PRs for helping to track this down, #38807 and #38903, I would appreciate anyone taking a look at them.

@dakrone I can take this one.

elasticmachine commented 5 years ago

Pinging @elastic/es-distributed

dnhatn commented 5 years ago

Another instance with SplitIndexIT: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.1+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=java8,nodes=immutable&&linux&&docker/32/console

./gradlew :server:integTest --tests "org.elasticsearch.action.admin.indices.create.SplitIndexIT.testSplitIndexPrimaryTerm" \
  -Dtests.seed=AED85B1C36454065 \
  -Dtests.security.manager=true \
  -Dtests.locale=es-US \
  -Dtests.timezone=Etc/GMT+12 \
  -Dcompiler.java=12 \
  -Druntime.java=8
dnhatn commented 5 years ago

I am on it.

dnhatn commented 5 years ago

This is a Lucene issue. I'll open the fix soon.

dnhatn commented 5 years ago

Another instance https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/1517/console.

dnhatn commented 5 years ago

I opened https://issues.apache.org/jira/browse/LUCENE-8809.

dnhatn commented 5 years ago

Talked to @DaveCTurner on another channel, we prefer to leave this issue open until we have a new Lucene snapshot.

dnhatn commented 5 years ago

This issue was fixed on 7.4+ as LUCENE-8809 was integrated on master and 7.x. Unfortunately, we are unlikely to have a new minor Lucene release for 7.2 and 7.3. We can upgrade to Lucene-7.7.2 on the 6.8 branch once the 7.3 branch is no longer used for releases.

I am closing this issue and leaving the accounting circuit breaker disabled on 6.8, 7.2 and 7.3.