Open mikemccand opened 10 months ago
Quick update: it turns out some of those failures were in fact a bug in Lucene! So please ignore the failures about RamUsageEstimator
. See small discussion here. It really is valuable to Lucene that we CI on many different JVMs ...
But I think the above exception is possibly J9 specific.
@mikemccand Thanks for letting us know. Do you have instructions on how we can reproduce the issue here
Hi @tajila -- normally the Lucene build failures have a nice Reproduce with: ...
line which you can copy/paste into a Lucene 9.x clone. But this particular build failure is missing that for some reason.
That said, if you clone Lucene's branch_9x
from https://github.com/apache/lucene, possibly roll back to revision around August 2, 2023, and run:
./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced"
It will try to reproduce. However, I just tried that, with OpenJ9 openjdk full version "17.0.8.1+1"
(downloaded from the IBM developerworks site) and it fails to reproduce for me (the test is using threads which often leads to unreproducible seeds).
When this happens (failure to reproduce the test on one run) we can sometimes add another parameter to run test many times, -Ptests.iters=N
, and it may reproduce the failure. I just did that:
./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced" -Ptests.iters=100
And the failure reproduced!:
org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:FB994C7CD25C034A]} FAILED
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
2> NOTE: reproduce with: gradlew test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.seed=F7B4CD7A5624D5EC -Dtests.locale=en-DK -Dtests.timezone=HST -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
2> NOTE: test params are: codec=CheapBastard, sim=Asserting(RandomSimilarity(queryNorm=false): {field=DFR GLZ(0.3)}), locale=en-DK, timezone=HST
2> NOTE: Linux 6.4.11-arch2-1 amd64/IBM Corporation 17.0.8.1 (64-bit)/cpus=32,threads=1,free=398591152,total=529530880
2> NOTE: All tests run in this JVM: [TestIndexWriterThreadsToSegments]
This doesn't repro with OpenJDK 21 (I had to remove the -Xgcpolicy:balanced
.
Thanks @mikemccand I was able to reproduce this with:
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.jenkins.BuildJDK17x86-64linuxNightly)
Eclipse OpenJ9 VM (build master-b614f6cd7f7, JRE 17 Linux amd64-64-Bit Compressed References 20231103_611 (JIT enabled, AOT enabled)
OpenJ9 - b614f6cd7f7
OMR - 3cdf41f96fc
JCL - aa764da8110 based on jdk-17.0.9+9)
I was able to reproduce this with:
Great!
Here is another possibly J9 specific failure: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45480/
org.apache.lucene.index.TestIndexSorting.txt, copied below:
> org.apache.lucene.index.CheckIndex$CheckIndexException: vector term=[0 0 0] field=sparse_text does not exist in postings; doc=66
> at __randomizedtesting.SeedInfo.seed([741EFE91CE6EF9D7:2AEA93B992E44A5B]:0)
> at app//org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:3700)
> at app//org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1013)
> at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:721)
> at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:559)
> at app//org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:343)
> at app//org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:909)
> at app//org.apache.lucene.util.IOUtils.close(IOUtils.java:85)
> at app//org.apache.lucene.util.IOUtils.close(IOUtils.java:72)
> at app//org.apache.lucene.index.TestIndexSorting.testIndexSortWithSparseField(TestIndexSorting.java:2769)
> at java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at java.base@17.0.5/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at java.base@17.0.5/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@17.0.5/java.lang.reflect.Method.invoke(Method.java:568)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
> at java.base@17.0.5/java.lang.Thread.run(Thread.java:857)
2> NOTE: reproduce with: gradlew test --tests TestIndexSorting.testIndexSortWithSparseField -Dtests.seed=741EFE91CE6EF9D7 -Dtests.multiplier=3 -Dtests.locale=ksb-TZ -Dtests.timezone=Africa/Gaborone -Dtests.asserts=true -Dtests.file.encoding=UTF-8
2> NOTE: leaving temporary files on disk at: /home/jenkins/workspace/Lucene-main-Linux/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexSorting_741EFE91CE6EF9D7-001
2> NOTE: test params are: codec=Asserting(Lucene99): {field=PostingsFormat(name=LuceneVarGapFixedInterval), docs=PostingsFormat(name=Asserting), sparse_text=PostingsFormat(name=LuceneVarGapFixedInterval), norms=PostingsFormat(name=Asserting), positions=PostingsFormat(name=LuceneVarGapFixedInterval), id=PostingsFormat(name=Asserting), term_vectors=BlockTreeOrds(blocksize=128)}, docValues:{multi_valued_long=DocValuesFormat(name=Lucene90), double=DocValuesFormat(name=Asserting), foo=DocValuesFormat(name=Asserting), numeric=DocValuesFormat(name=Asserting), sparse_binary=DocValuesFormat(name=Lucene90), sort=DocValuesFormat(name=Asserting), float=DocValuesFormat(name=Lucene90), multi_valued_numeric=DocValuesFormat(name=Asserting), int=DocValuesFormat(name=Lucene90), long=DocValuesFormat(name=Lucene90), sparse_int=DocValuesFormat(name=Asserting), bar=DocValuesFormat(name=Lucene90), sorted=DocValuesFormat(name=Lucene90), multi_valued_double=DocValuesFormat(name=Asserting), sparse=DocValuesFormat(name=Lucene90), multi_valued_string=DocValuesFormat(name=Lucene90), bytes=DocValuesFormat(name=Lucene90), binary=DocValuesFormat(name=Lucene90), dense_int=DocValuesFormat(name=Lucene90), id=DocValuesFormat(name=Lucene90), multi_valued_int=DocValuesFormat(name=Asserting), multi_valued_bytes=DocValuesFormat(name=Asserting), multi_valued_float=DocValuesFormat(name=Lucene90)}, maxPointsInLeafNode=1114, maxMBSortInHeap=6.390174474628566, sim=Asserting(RandomSimilarity(queryNorm=false): {field=DFR GBZ(0.3), positions=IB SPL-L2, id=DFR I(ne)B3(800.0), term_vectors=DFR I(ne)B2}), locale=ksb-TZ, timezone=Africa/Gaborone
2> NOTE: Linux 6.2.0-35-generic amd64/IBM Corporation 17.0.5 (64-bit)/cpus=16,threads=1,free=224949352,total=268435456
2> NOTE: All tests run in this JVM: [TestDemo, TestAnalysisSPILoader, TestCharFilter, TestStopFilter, TestPackedTokenAttributeImpl, TestDeflateWithPresetDictCompressionMode, TestLucene90CompoundFormat, TestLucene90NormsFormatMergeInstance, TestMSBVLong, TestLucene99HnswQuantizedVectorsFormat, TestPerFieldDocValuesFormat, TestDocument, TestFieldType, TestLatLonDocValuesMultiPointPointQueries, TestLatLonMultiPointShapeQueries, TestLatLonPointShapeDVQueries, TestLongDistanceFeatureQuery, TestSortedSetDocValuesSetQuery, TestXYPointShapeDVQueries, TestCircle, TestPoint2D, TestXYCircle, Test2BDocs, Test2BSortedDocValuesFixedSorted, TestAllFilesDetectMismatchedChecksum, TestBagOfPositions, TestByteSliceReader, TestConcurrentMergeScheduler, TestDeletionPolicy, TestDocIDMerger, TestDocsWithFieldSet, TestDuelingCodecsAtNight, TestFieldUpdatesBuffer, TestFilterMergePolicy, TestIndexCommit, TestIndexSorting]
There's some OpenJ9 specific discussion about this build failure: https://lists.apache.org/thread/svt6bqqwdkb4kq7b9zhx630n4sj27ovq
There's some OpenJ9 specific discussion about this build failure: https://lists.apache.org/thread/svt6bqqwdkb4kq7b9zhx630n4sj27ovq
This is unrelated to OpenJ9, it is caused by errorprone not working with forked compilers.
There's some OpenJ9 specific discussion about this build failure: https://lists.apache.org/thread/svt6bqqwdkb4kq7b9zhx630n4sj27ovq
This is unrelated to OpenJ9, it is caused by errorprone not working with forked compilers.
Woops thanks for the correction @uschindler.
This one looks maybe unique to OpenJ9?
https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/14098
2> NOTE: reproduce with: gradlew test --tests TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields -Dtests.seed=4CB1759046D1E087 -Dtests.multiplier=3 -Dtests.locale=bo-IN -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8
2> ཟླ་༡༡ 21, 2023 5:38:35 སྔ་དྲོ་ com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #0,5,TGRP-TestLucene90DocValuesFormat]
2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: Invalid vInt detected (too many bits)
2> at __randomizedtesting.SeedInfo.seed([4CB1759046D1E087]:0)
2> at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
2> at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
2> Caused by: java.lang.RuntimeException: Invalid vInt detected (too many bits)
2> at org.apache.lucene.store.ByteArrayDataInput.readVInt(ByteArrayDataInput.java:128)
2> at org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.next(Lucene90DocValuesProducer.java:1100)
2> at org.apache.lucene.index.TermsEnumIndex.next(TermsEnumIndex.java:95)
2> at org.apache.lucene.index.OrdinalMap.<init>(OrdinalMap.java:269)
2> at org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:176)
2> at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:663)
2> at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:146)
2> at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:154)
2> at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:178)
2> at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
2> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
2> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5177)
2> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4710)
2> at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6465)
2> at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
2> at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
2>
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=875, name=Lucene Merge Thread #0, state=RUNNABLE, group=TGRP-TestLucene90DocValuesFormat]
> at __randomizedtesting.SeedInfo.seed([4CB1759046D1E087:7938F85897C76D67]:0)
>
> Caused by:
> org.apache.lucene.index.MergePolicy$MergeException: java.lang.RuntimeException: Invalid vInt detected (too many bits)
> at __randomizedtesting.SeedInfo.seed([4CB1759046D1E087]:0)
> at app//org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
> at app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
>
> Caused by:
> java.lang.RuntimeException: Invalid vInt detected (too many bits)
> at app//org.apache.lucene.store.ByteArrayDataInput.readVInt(ByteArrayDataInput.java:128)
> at app//org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDict.next(Lucene90DocValuesProducer.java:1100)
> at app//org.apache.lucene.index.TermsEnumIndex.next(TermsEnumIndex.java:95)
> at app//org.apache.lucene.index.OrdinalMap.<init>(OrdinalMap.java:269)
> at app//org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:176)
> at app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:663)
> at app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:146)
> at app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:154)
> at app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:178)
> at app//org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
> at app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
> at app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5177)
> at app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4710)
> at app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6465)
> at app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
> at app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
2> NOTE: reproduce with: gradlew test --tests TestLucene90DocValuesFormat.testSortedMergeAwayAllValues -Dtests.seed=4CB1759046D1E087 -Dtests.multiplier=3 -Dtests.locale=bo-IN -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8
2> NOTE: leaving temporary files on disk at: /home/jenkins/workspace/Lucene-9.x-Linux/lucene/core/build/tmp/tests-tmp/lucene.codecs.lucene90.TestLucene90DocValuesFormat_4CB1759046D1E087-001
2> NOTE: test params are: codec=CheapBastard, sim=Asserting(RandomSimilarity(queryNorm=true): {fieldname=DFR I(ne)L2, docId=IB SPL-DZ(0.3), id=LM Jelinek-Mercer(0.100000)}), locale=bo-IN, timezone=Asia/Jerusalem
2> NOTE: Linux 6.2.0-36-generic amd64/IBM Corporation 11.0.20.1 (64-bit)/cpus=16,threads=1,free=134479232,total=470810624
2> NOTE: All tests run in this JVM: [TestExternalCodecs, TestAnalyzerWrapper, TestCharacterUtils, TestWordlistLoader, TestSimpleAttributeImpl, TestFastCompressionMode, TestLucene90DocValuesFormat]
This one looks maybe unique to OpenJ9?
Yes it is, This is clearly a bug, it looks like it is flipping bits in the vInt encoding due to some problems in code optimizations. It only happens with OpenJ9 and this is an ongoing issue (happens quite often).
The above run also produced other errors. What is interesting: Once OpenJ9 gets into an invalid state it then produces more and more errors in tests as followup. The above run also broke LZ4 compression:
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([4CB1759046D1E087:200409C91DCA2BF1]:0)
at app//org.apache.lucene.util.compress.LZ4.decompress(LZ4.java:116)
at app//org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at app//org.apache.lucene.codecs.compressing.AbstractTestCompressionMode.decompress(AbstractTestCompressionMode.java:77)
at app//org.apache.lucene.codecs.compressing.AbstractTestCompressionMode.decompress(AbstractTestCompressionMode.java:71)
at app//org.apache.lucene.codecs.compressing.AbstractTestCompressionMode.test(AbstractTestCompressionMode.java:131)
at app//org.apache.lucene.codecs.compressing.AbstractTestCompressionMode.test(AbstractTestCompressionMode.java:126)
at app//org.apache.lucene.codecs.compressing.AbstractTestCompressionMode.testExtremelyLargeInput(AbstractTestCompressionMode.java:163)
The main issue here looks like ByteArrayDataInput
was miscompiled and reading from it creates some incorrect decoded integers/bytes/shorts. Aftre that downstream code like LZ4 decompression or reading VInts (that are ints which are encoded like UTF-8 sequences) then hits assertions.
@tajila I am investigating the problem in the GitHub issue.
I was able to reproduce this with: openjdk version "17.0.9-internal" 2023-10-17 OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.jenkins.BuildJDK17x86-64linuxNightly) Eclipse OpenJ9 VM (build master-b614f6cd7f7, JRE 17 Linux amd64-64-Bit Compressed References 20231103_611 (JIT enabled, AOT enabled) OpenJ9 - b614f6cd7f7 OMR - 3cdf41f96fc JCL - aa764da8110 based on jdk-17.0.9+9)
@tajila how did you ensure that the jdk.jfr.consumer
package was recognized during the compilation process for Lucene when reproducing the test?
root@c24919v1:~/openj9_issues_18400/lucene# $JAVA_HOME/bin/java -version
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled)
OpenJ9 - 461bf3c70
OMR - 5eee6ad9d
JCL - 3699725139c based on jdk-17.0.9+9)
root@c24919v1:~/openj9_issues_18400/lucene# ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced" -Ptests.iters=100 --info --scan
Initialized native services in: /root/.gradle/native
Initialized jansi services in: /root/.gradle/native
Received JVM installation metadata from '/root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk': {JAVA_HOME=/root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk, JAVA_VERSION=17.0.9-internal, JAVA_VENDOR=Eclipse OpenJ9, RUNTIME_NAME=OpenJDK Runtime Environment, RUNTIME_VERSION=17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17, VM_NAME=Eclipse OpenJ9 VM, VM_VERSION=openj9-0.41.0, VM_VENDOR=Eclipse OpenJ9, OS_ARCH=amd64}
The client will now receive all logging from the daemon (pid: 3401592). The daemon log file: /root/.gradle/daemon/8.4/daemon-3401592.out.log
Starting 11th build in daemon [uptime: 31 mins 36.069 secs, performance: 98%]
Using 4 worker leases.
Now considering [/root/openj9_issues_18400/lucene, /root/openj9_issues_18400/lucene/dev-tools/missing-doclet, /root/openj9_issues_18400/lucene/buildSrc] as hierarchies to watch
Now considering [/root/openj9_issues_18400/lucene/buildSrc, /root/openj9_issues_18400/lucene, /root/openj9_issues_18400/lucene/dev-tools/missing-doclet] as hierarchies to watch
Watching the file system is configured to be enabled if available
File system watching is active
Starting Build
Settings evaluated using settings file '/root/openj9_issues_18400/lucene/settings.gradle'.
Now considering [/root/openj9_issues_18400/lucene/dev-tools/missing-doclet, /root/openj9_issues_18400/lucene/buildSrc, /root/openj9_issues_18400/lucene] as hierarchies to watch
Projects loaded. Root project using build file '/root/openj9_issues_18400/lucene/build.gradle'.
Included projects: [root project 'lucene-root', project ':lucene', project ':lucene:analysis', project ':lucene:analysis.tests', project ':lucene:backward-codecs', project ':lucene:benchmark', project ':lucene:classification', project ':lucene:codecs', project ':lucene:core', project ':lucene:core.tests', project ':lucene:demo', project ':lucene:distribution', project ':lucene:distribution.tests', project ':lucene:documentation', project ':lucene:expressions', project ':lucene:facet', project ':lucene:grouping', project ':lucene:highlighter', project ':lucene:join', project ':lucene:luke', project ':lucene:memory', project ':lucene:misc', project ':lucene:monitor', project ':lucene:queries', project ':lucene:queryparser', project ':lucene:replicator', project ':lucene:sandbox', project ':lucene:spatial-extras', project ':lucene:spatial-test-fixtures', project ':lucene:spatial3d', project ':lucene:suggest', project ':lucene:test-framework', project ':lucene:analysis:common', project ':lucene:analysis:icu', project ':lucene:analysis:kuromoji', project ':lucene:analysis:morfologik', project ':lucene:analysis:morfologik.tests', project ':lucene:analysis:nori', project ':lucene:analysis:opennlp', project ':lucene:analysis:phonetic', project ':lucene:analysis:smartcn', project ':lucene:analysis:stempel']
> Configure project :missing-doclet
Evaluating project ':missing-doclet' using build file '/root/openj9_issues_18400/lucene/dev-tools/missing-doclet/build.gradle'.
Registering project ':missing-doclet' in composite build. Will substitute for module 'org.apache.lucene.tools:missing-doclet'.
> Configure project :buildSrc
Evaluating project ':buildSrc' using build file '/root/openj9_issues_18400/lucene/buildSrc/build.gradle'.
Resolve mutations for :buildSrc:compileJava (Thread[Execution worker,5,main]) started.
:buildSrc:compileJava (Thread[Execution worker,5,main]) started.
> Task :buildSrc:compileJava FAILED
Caching disabled for task ':buildSrc:compileJava' because:
Build cache is disabled
Task ':buildSrc:compileJava' is not up-to-date because:
Task has failed previously.
The input changes require a full rebuild for incremental task ':buildSrc:compileJava'.
Watching 13 directories to track changes
Watching 12 directories to track changes
Watching 11 directories to track changes
Watching 10 directories to track changes
Full recompilation is required because no incremental change information is available. This is usually caused by clean builds or changing compiler arguments.
Compiling with toolchain '/root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk'.
Compiling with JDK Java compiler API.
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:30: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordedClass;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:31: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordedEvent;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:32: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordedFrame;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:33: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordedMethod;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:34: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordedStackTrace;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:35: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordedThread;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:36: error: package jdk.jfr.consumer does not exist
import jdk.jfr.consumer.RecordingFile;
^
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:48: error: cannot find symbol
static String frameToString(RecordedFrame frame, boolean lineNumbers) {
^
symbol: class RecordedFrame
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:101: error: cannot find symbol
static boolean isInteresting(String mode, RecordedEvent event) {
^
symbol: class RecordedEvent
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:116: error: cannot find symbol
static boolean isGradlePollThread(RecordedThread thread) {
^
symbol: class RecordedThread
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:121: error: cannot find symbol
static long getValue(RecordedEvent event) {
^
symbol: class RecordedEvent
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:50: error: cannot find symbol
RecordedMethod method = frame.getMethod();
^
symbol: class RecordedMethod
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:51: error: cannot find symbol
RecordedClass clazz = method.getType();
^
symbol: class RecordedClass
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:170: error: cannot find symbol
try (RecordingFile recording = new RecordingFile(Paths.get(file))) {
^
symbol: class RecordingFile
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:170: error: cannot find symbol
try (RecordingFile recording = new RecordingFile(Paths.get(file))) {
^
symbol: class RecordingFile
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:172: error: cannot find symbol
RecordedEvent event = recording.readEvent();
^
symbol: class RecordedEvent
location: class ProfileResults
/root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java:176: error: cannot find symbol
RecordedStackTrace trace = event.getStackTrace();
^
symbol: class RecordedStackTrace
location: class ProfileResults
Note: /root/openj9_issues_18400/lucene/buildSrc/src/main/java/org/apache/lucene/gradle/Checksum.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17 errors
Watching 11 directories to track changes
Watching 12 directories to track changes
Watching 13 directories to track changes
Watching 14 directories to track changes
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':buildSrc:compileJava'.
> Compilation failed; see the compiler error output for details.
BUILD FAILED in 3s
1 actionable task: 1 executed
Publishing a build scan to scans.gradle.com requires accepting the Gradle Terms of Service defined at https://gradle.com/terms-of-service. Do you accept these terms? [yes, no] yes
Gradle Terms of Service accepted.
Publishing build scan...
<-------------> 4% CONFIGURING [1m 3s]
https://gradle.com/s/ietctyxbc2nlm
I was able to reproduce this with: openjdk version "17.0.9-internal" 2023-10-17 OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.jenkins.BuildJDK17x86-64linuxNightly) Eclipse OpenJ9 VM (build master-b614f6cd7f7, JRE 17 Linux amd64-64-Bit Compressed References 20231103_611 (JIT enabled, AOT enabled) OpenJ9 - b614f6c OMR - 3cdf41f96fc JCL - aa764da8110 based on jdk-17.0.9+9)
@tajila how did you ensure that the
jdk.jfr.consumer
package was recognized during the compilation process for Lucene when reproducing the test?root@c24919v1:~/openj9_issues_18400/lucene# $JAVA_HOME/bin/java -version openjdk version "17.0.9-internal" 2023-10-17 OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17) Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled) OpenJ9 - 461bf3c70 OMR - 5eee6ad9d JCL - 3699725139c based on jdk-17.0.9+9)
I am working on a solution to disable Java Flight Recorder Support when the module isn't there. This is just a build system issue and unrelated to the issues here.
During Lucene builds we always use Hotspot VMs, but you can run the test suite with any other JVM. Just pass RUNTIME_JAVA_HOME
when invoking Gradle. JAVA_HOME
should be a Hotspot JDK and is solely used to run Gradle.
Hi, I created https://github.com/apache/lucene/pull/12845 to work around the J9 VM not having Java Flight Recorder by checking for the module:
>gradlew :lucene:core:test
> Configure project :buildSrc
Module jdk.jfr is not available; skipping compilation of Java Flight Recorder support.
Code was merged, if you update Lucene's main branch, you should be able to test this with pure OpenJ9 (also to run Gradle).
During Lucene builds we always use Hotspot VMs, but you can run the test suite with any other JVM. Just pass RUNTIME_JAVA_HOME when invoking Gradle. JAVA_HOME should be a Hotspot JDK and is solely used to run Gradle.
I was able to reproduce the test failure in TestIndexWriterThreadsToSegments
but not the test failure in TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields
.
$JAVA_HOME/bin/java -version
java version "17.0.9" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 17.0.9+11-LTS-201)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.9+11-LTS-201, mixed mode, sharing)
$RUNTIME_JAVA_HOME/bin/java -version
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled)
OpenJ9 - 461bf3c70
OMR - 5eee6ad9d
JCL - 3699725139c based on jdk-17.0.9+9)
Code was merged, if you update Lucene's main branch, you should be able to test this with pure OpenJ9 (also to run Gradle).
$JAVA_HOME/bin/java -version
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled)
OpenJ9 - 461bf3c70
OMR - 5eee6ad9d
JCL - 3699725139c based on jdk-17.0.9+9)
./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced" -Ptests.iters=100
...
> Task :lucene:core:test
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by java.lang.System
WARNING: Please consider reporting this to the maintainers of java.lang.System
WARNING: System::setSecurityManager will be removed in a future release
org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:E0E420713DA7EABA]} FAILED
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2434, name=Thread-2402, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2436, name=Thread-2404, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2435, name=Thread-2403, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2440, name=Thread-2408, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2441, name=Thread-2409, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2439, name=Thread-2407, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2433, name=Thread-2401, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2438, name=Thread-2406, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2442, name=Thread-2410, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
Caused by:
java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
Caused by:
java.util.concurrent.BrokenBarrierException
at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
...
@tajila what could be the next step(s) to find the root cause of the problem based on the analysis below?
It seems like creating a micro test that reproduces the issue or identifying the possible commit(s) in the lucene repo that may have caused the issue could be next steps.
TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
fails with openj9 jdk17 during 1000 iterations.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
fails with openj9 jdk21 during 1000 iterations.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
fails with hotspot jdk17 during 1000 iterations.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
fails with hotspot jdk21 during 1000 iterations.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
(-Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced"
) do not seem to affect the test failures with openj9 and hotspot. -Xgcpolicy:balanced
was omitted when running the test with hotspot.TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom
seems to be on July 22, 2023 https://lists.apache.org/list?builds@lucene.apache.org:2023-7:Lucene-9.x-Linux%20(64bit/openj9/jdk-17.0.5).openj9 jdk17
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled)
OpenJ9 - 461bf3c70
OMR - 5eee6ad9d
JCL - 3699725139c based on jdk-17.0.9+9)
openj9 jdk21
openjdk version "21.0.1-internal" 2023-10-17
OpenJDK Runtime Environment (build 21.0.1-internal-adhoc.root.openj9-openjdk-jdk21)
Eclipse OpenJ9 VM (build master-1ba18acb3, JRE 21 Linux amd64-64-Bit Compressed References 20231128_000000 (JIT enabled, AOT enabled)
OpenJ9 - 1ba18acb3
OMR - 8b19b8082
JCL - 6f6d9413679 based on jdk-21.0.1+12)
hotspot jdk17
java version "17.0.9" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 17.0.9+11-LTS-201)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.9+11-LTS-201, mixed mode, sharing)
hotspot jdk21
java version "21.0.1" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 21.0.1+12-LTS-29)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.1+12-LTS-29, mixed mode, sharing
I would start by looking at what causes the assertion failure
at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
Since you have access to the source you can instrument it to see where the discrepency arises from.
The intermittent assertion failure in testSegmentCountOnFlushRandom
seems to have started after https://github.com/apache/lucene/commit/bae5b338ed93b352062e6feb8321945fbed11ca5 was added in the lucene repo, and a solution to fix the intermittent assertion failure is to make the DocumentsWriterPerThreadPool.getAndLock()
method synchronized as it seems to fix a race condition that occurs while multiple threads concurrently add documents to an index during the test.
Thanks for the report. Indeed the method can be synchronized, because newWriter() is synchronized anyways, so it won't add more contention.
@mikemccand do you want to take the lead as you know IndexWriter better than I do?
I think the more serious issues here are the ones corrupting data while reading; there's no concurrency involved: https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1820631659
I think the more serious issues here are the ones corrupting data while reading; there's no concurrency involved: https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1820631659
I was able to reproduce the failure in TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields
in branch_9x
in the lucene repo by adding the TEST_JVM_ARGS to the reproduce with: command:
./gradlew test --tests TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields -Dtests.seed=4CB1759046D1E087 -Dtests.multiplier=3 -Dtests.locale=bo-IN -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced"
$JAVA_HOME/bin/java -version
java version "11.0.20" 2023-07-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.20+9-LTS-256)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.20+9-LTS-256, mixed mode)
$RUNTIME_JAVA_HOME/bin/java -version
openjdk version "11.0.22-internal" 2024-01-16
OpenJDK Runtime Environment (build 11.0.22-internal+0-adhoc.root.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build master-21f742fbc, JRE 11 Linux amd64-64-Bit Compressed References 20231208_000000 (JIT enabled, AOT enabled)
OpenJ9 - 21f742fbc
OMR - 137778af4
JCL - 22b36573fd based on jdk-11.0.22+6)
@tajila the cause of the failure in TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields
seems to be due to the -Xgcpolicy
option and the JIT:
-Xgcpolicy:balanced
is omitted from the jvm args.-Xgcpolicy
: gencon
, optavgpause
and optthruput
.-Xgcpolicy
: balanced
, metronome
and nogc
.-Xgcpolicy:balanced
and -Xgcpolicy:metronome
when the JIT is off with -Xint
in the jvm args.@mikemccand do you want to take the lead as you know IndexWriter better than I do?
Thanks @singh264 and @uschindler -- I'll open a Lucene GitHub issue to track this on Lucene's side! Love the open source collaboration here!
@tajila would it be appropriate to request the GC or JIT team to assist in finding the root cause of the failure in TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields
, which seems to be related to the GC and JIT components based on https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1850177955?
A workaround for the failure in TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields
is to use the default -Xgcpolicy:gencon
.
@dmitripivkine requesting your feedback on this issue. This issue indicated two lucene test failures (testSegmentCountOnFlushRandom and testSortedVariableLengthBigVsStoredFields) with the cause that seemed specific to openj9:
@hzongaro pls take a look, https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1850177955 indicates a JIT issue.
@dmitripivkine requesting your feedback on this issue. This issue indicated two lucene test failures (testSegmentCountOnFlushRandom and testSortedVariableLengthBigVsStoredFields) with the cause that seemed specific to openj9:
The fact that Balanced and Metronome work with -Xint pointed to JIT most likely. Both of these GC policies required JIT to be configured differently (disabled optimizations mostly). So, I think JIT investigation is a reasonable next point. If you have a reason to think it is GC related we need system core captured at assertion point. Even with this core it might be hard to connect failure to GC activity, so detected discrepancies explanation appreciated.
A core file was generated with:
./gradlew test --tests TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields -Dtests.seed=4CB1759046D1E087 -Dtests.multiplier=3 -Dtests.locale=bo-IN -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced -Xdump:java+system:events=throw,filter=java/lang/RuntimeException"
> !threads
!stack 0x0001aa00 !j9vmthread 0x0001aa00 !j9thread 0x7f5600007660 tid 0x3ede42 (4120130) // (Test worker)
!stack 0x0001e600 !j9vmthread 0x0001e600 !j9thread 0x7f5600b472a0 tid 0x3ede5c (4120156) // (JIT Compilation Thread-000 Suspended)
!stack 0x00022300 !j9vmthread 0x00022300 !j9thread 0x7f5600b47818 tid 0x3ede5d (4120157) // (JIT Compilation Thread-001)
!stack 0x00025f00 !j9vmthread 0x00025f00 !j9thread 0x7f5600b487b0 tid 0x3ede5e (4120158) // (JIT Compilation Thread-002 Suspended)
!stack 0x00029c00 !j9vmthread 0x00029c00 !j9thread 0x7f5600b48d28 tid 0x3ede5f (4120159) // (JIT Compilation Thread-003 Suspended)
!stack 0x0002d800 !j9vmthread 0x0002d800 !j9thread 0x7f5600b49cc0 tid 0x3ede60 (4120160) // (JIT Compilation Thread-004 Suspended)
!stack 0x00031500 !j9vmthread 0x00031500 !j9thread 0x7f5600b4a238 tid 0x3ede61 (4120161) // (JIT Compilation Thread-005 Suspended)
!stack 0x00035100 !j9vmthread 0x00035100 !j9thread 0x7f5600b4b1d0 tid 0x3ede62 (4120162) // (JIT Compilation Thread-006 Suspended)
!stack 0x00038e00 !j9vmthread 0x00038e00 !j9thread 0x7f5600b4b748 tid 0x3ede63 (4120163) // (JIT Diagnostic Compilation Thread-007 Suspended)
!stack 0x0003ca00 !j9vmthread 0x0003ca00 !j9thread 0x7f5600bab310 tid 0x3ede65 (4120165) // (JIT-SamplerThread)
!stack 0x00040700 !j9vmthread 0x00040700 !j9thread 0x7f5600bab888 tid 0x3ede66 (4120166) // (IProfiler)
!stack 0x0013c500 !j9vmthread 0x0013c500 !j9thread 0x7f5600bdeb10 tid 0x3ede67 (4120167) // (Common-Cleaner)
!stack 0x00221300 !j9vmthread 0x00221300 !j9thread 0x7f5600cded40 tid 0x3ede69 (4120169) // (Dedicated GC Main)
!stack 0x00224f00 !j9vmthread 0x00224f00 !j9thread 0x7f5600cdf2b8 tid 0x3ede6a (4120170) // (GC Worker)
!stack 0x00228c00 !j9vmthread 0x00228c00 !j9thread 0x7f5600cdffd0 tid 0x3ede6b (4120171) // (GC Worker)
!stack 0x0022c800 !j9vmthread 0x0022c800 !j9thread 0x7f5600ce0548 tid 0x3ede6c (4120172) // (GC Worker)
!stack 0x00230500 !j9vmthread 0x00230500 !j9thread 0x7f5600ce1260 tid 0x3ede6d (4120173) // (GC Worker)
!stack 0x00234100 !j9vmthread 0x00234100 !j9thread 0x7f5600ce17d8 tid 0x3ede6e (4120174) // (GC Worker)
!stack 0x00237e00 !j9vmthread 0x00237e00 !j9thread 0x7f5600ce24f0 tid 0x3ede6f (4120175) // (GC Worker)
!stack 0x0023ba00 !j9vmthread 0x0023ba00 !j9thread 0x7f5600ce2a68 tid 0x3ede70 (4120176) // (GC Worker)
!stack 0x00260400 !j9vmthread 0x00260400 !j9thread 0x7f5600ce45f8 tid 0x3ede72 (4120178) // (Attach API wait loop)
!stack 0x00247700 !j9vmthread 0x00247700 !j9thread 0x7f5600ce4080 tid 0x3ede73 (4120179) // (ClassByNameCache Reaper)
!stack 0x003d2c00 !j9vmthread 0x003d2c00 !j9thread 0x7f5600f94610 tid 0x3ede78 (4120184) // (/127.0.0.1:35490 to /127.0.0.1:45649 workers)
!stack 0x003e6900 !j9vmthread 0x003e6900 !j9thread 0x7f5600f94b88 tid 0x3ede79 (4120185) // (/127.0.0.1:35490 to /127.0.0.1:45649 workers Thread 2)
!stack 0x003ea500 !j9vmthread 0x003ea500 !j9thread 0x7f5600fbbb60 tid 0x3ede7a (4120186) // (/127.0.0.1:35490 to /127.0.0.1:45649 workers Thread 3)
!stack 0x004c0b00 !j9vmthread 0x004c0b00 !j9thread 0x7f5600fbc0d8 tid 0x3ede7b (4120187) // (SUITE-TestLucene90DocValuesFormat-seed#[4CB1759046D1E087])
!stack 0x003e7f00 !j9vmthread 0x003e7f00 !j9thread 0x7f55440e5ba0 tid 0x3ede7c (4120188) // (TEST-TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields-seed#[4CB1759046D1E087])
!stack 0x006d3e00 !j9vmthread 0x006d3e00 !j9thread 0x7f55440e6118 tid 0x3ede7e (4120190) // (Finalizer thread)
!stack 0x006d3000 !j9vmthread 0x006d3000 !j9thread 0x7f554050af10 tid 0x3ede7f (4120191) // (Lucene Merge Thread #0) <-----
> !stack 0x006d3000
<6d3000> Generic special frame
<6d3000> !j9method 0x00000000005CD458 org/apache/lucene/store/ByteArrayDataInput.readVInt()I
<6d3000> !j9method 0x000000000073E358 org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer$TermsDict.next()Lorg/apache/lucene/util/BytesRef;
<6d3000> !j9method 0x0000000000739488 org/apache/lucene/index/TermsEnumIndex.next()Lorg/apache/lucene/util/BytesRef;
<6d3000> !j9method 0x000000000073F258 org/apache/lucene/index/OrdinalMap.<init>(Lorg/apache/lucene/index/IndexReader$CacheKey;[Lorg/apache/lucene/index/TermsEnum;Lorg/apache/lucene/index/OrdinalMap$SegmentMap;F)V
<6d3000> !j9method 0x000000000073F238 org/apache/lucene/index/OrdinalMap.build(Lorg/apache/lucene/index/IndexReader$CacheKey;[Lorg/apache/lucene/index/TermsEnum;[JF)Lorg/apache/lucene/index/OrdinalMap;
<6d3000> !j9method 0x00000000005A4090 org/apache/lucene/codecs/DocValuesConsumer.mergeSortedField(Lorg/apache/lucene/index/FieldInfo;Lorg/apache/lucene/index/MergeState;)V
<6d3000> !j9method 0x00000000005A3FF0 org/apache/lucene/codecs/DocValuesConsumer.merge(Lorg/apache/lucene/index/MergeState;)V
<6d3000> !j9method 0x00000000005A4220 org/apache/lucene/codecs/perfield/PerFieldDocValuesFormat$FieldsWriter.merge(Lorg/apache/lucene/index/MergeState;)V
<6d3000> !j9method 0x000000000072C560 org/apache/lucene/index/SegmentMerger.mergeDocValues(Lorg/apache/lucene/index/SegmentWriteState;Lorg/apache/lucene/index/SegmentReadState;)V
<6d3000> !j9method 0x0000000000743F98 org/apache/lucene/index/SegmentMerger$$Lambda$205/0x0000000000000000.merge(Lorg/apache/lucene/index/SegmentWriteState;Lorg/apache/lucene/index/SegmentReadState;)V
<6d3000> !j9method 0x000000000072C680 org/apache/lucene/index/SegmentMerger.mergeWithLogging(Lorg/apache/lucene/index/SegmentMerger$VoidMerger;Lorg/apache/lucene/index/SegmentWriteState;Lorg/apache/lucene/index/SegmentReadState;Ljava/lang/String;I)V
<6d3000> !j9method 0x000000000072C520 org/apache/lucene/index/SegmentMerger.merge()Lorg/apache/lucene/index/MergeState;
<6d3000> !j9method 0x000000000048E798 org/apache/lucene/index/IndexWriter.mergeMiddle(Lorg/apache/lucene/index/MergePolicy$OneMerge;Lorg/apache/lucene/index/MergePolicy;)I
<6d3000> !j9method 0x000000000048E618 org/apache/lucene/index/IndexWriter.merge(Lorg/apache/lucene/index/MergePolicy$OneMerge;)V
<6d3000> !j9method 0x0000000000632738 org/apache/lucene/index/IndexWriter$IndexWriterMergeSource.merge(Lorg/apache/lucene/index/MergePolicy$OneMerge;)V
<6d3000> !j9method 0x000000000045ED98 org/apache/lucene/index/ConcurrentMergeScheduler.doMerge(Lorg/apache/lucene/index/MergeScheduler$MergeSource;Lorg/apache/lucene/index/MergePolicy$OneMerge;)V
<6d3000> !j9method 0x00000000006D83F8 org/apache/lucene/index/ConcurrentMergeScheduler$MergeThread.run()V
<6d3000> JNI call-in frame
<6d3000> Native method frame
> !stackslots 0x006d3000
<6d3000> *** BEGIN STACK WALK, flags = 00400001 walkThread = 7,155,712 ***
<6d3000> ITERATE_O_SLOTS
<6d3000> RECORD_BYTECODE_PC_OFFSET
<6d3000> Initial values: walkSP = 0x00000000006F95C8, PC = 0x0000000000000001, literals = 0x0000000000000000, A0 = 0x00000000006F95E0, j2iFrame = 0x0000000000000000, decomp = 0x0000000000000000
<6d3000> Generic special frame: bp = 0x00000000006F95E0, sp = 0x00000000006F95C8, pc = 0x0000000000000001, cp = 0x0000000000000000, arg0EA = 0x00000000006F95E0, flags = 0x0000000000000000
<6d3000> Bytecode frame: bp = 0x00000000006F95F8, sp = 0x00000000006F95E8, pc = 0x00007F5540119215, cp = 0x00000000005CD4E0, arg0EA = 0x00000000006F9610, flags = 0x0000000000000000
<6d3000> Method: org/apache/lucene/store/ByteArrayDataInput.readVInt()I !j9method 0x00000000005CD458
<6d3000> Bytecode index = 173
<6d3000> Using local mapper
<6d3000> Locals starting at 0x00000000006F9610 for 0x0000000000000003 slots
<6d3000> I-Slot: a0[0x00000000006F9610] = 0x00000000FACADAB0 <-----
<6d3000> I-Slot: t1[0x00000000006F9608] = 0x00000000FFFFFF8F
<6d3000> I-Slot: t2[0x00000000006F9600] = 0x00000000F1E3C78F
<6d3000> Bytecode frame: bp = 0x00000000006F9630, sp = 0x00000000006F9618, pc = 0x00007F552C07B0EA, cp = 0x000000000073D9F0, arg0EA = 0x00000000006F9658, flags = 0x0000000000000000
...
> !j9object 0x00000000FACADAB0
!J9Object 0x00000000FACADAB0 {
struct J9Class* clazz = !j9class 0x5CD000 // org/apache/lucene/store/ByteArrayDataInput
Object flags = 0x00000000;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
[B bytes = !fj9object 0xfa780000 (offset = 4) (org/apache/lucene/store/ByteArrayDataInput)
I pos = 0x00001BF3 (offset = 8) (org/apache/lucene/store/ByteArrayDataInput)
I limit = 0x000A179E (offset = 12) (org/apache/lucene/store/ByteArrayDataInput)
}
@BradleyWood, may I ask you to take a look at what are potentially JIT-related failures reported here?
A core file was generated with:
@singh264 I have no access to the file, you should change permissions in the box.
Also please summarize the problem for this particular core (deeper than trivial output you have provided).
I have no access to the file, you should change permissions in the box
The core file can now be downloaded with the google drive link, I was unable to extend the expiration date of the box link.
point to exception line in the source code describe what is expected behaviour is and what is observed instead
A RuntimeException is thrown by ByteArrayDataInput.readVInt()
when running the TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields test.
public int readVInt() {
byte b = bytes[pos++];
if (b >= 0) return b;
int i = b & 0x7F;
b = bytes[pos++];
i |= (b & 0x7F) << 7;
if (b >= 0) return i;
b = bytes[pos++];
i |= (b & 0x7F) << 14;
if (b >= 0) return i;
b = bytes[pos++];
i |= (b & 0x7F) << 21;
if (b >= 0) return i;
b = bytes[pos++];
// Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors:
i |= (b & 0x0F) << 28;
if ((b & 0xF0) == 0) return i;
throw new RuntimeException("Invalid vInt detected (too many bits)");
}
The expected behaviour is that readVInt()
reads an integer from the byte array by reading the lower 7 bits of each byte until the most significant bit of the byte is 0 (b >= 0) and without reading more than 32 bits from the byte array. The actual behaviour is that the byte array includes 5 consecutive bytes with most significant bit that is 1, which suggests that the integer is stored in more than 32 bits in the byte array:
> !j9object 0x00000000FACADAB0
!J9Object 0x00000000FACADAB0 {
struct J9Class* clazz = !j9class 0x5CD000 // org/apache/lucene/store/ByteArrayDataInput
Object flags = 0x00000000;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
[B bytes = !fj9object 0xfa780000 (offset = 4) (org/apache/lucene/store/ByteArrayDataInput) <-----
I pos = 0x00001BF3 (offset = 8) (org/apache/lucene/store/ByteArrayDataInput) // 7155
I limit = 0x000A179E (offset = 12) (org/apache/lucene/store/ByteArrayDataInput)
}
> !j9indexableobject 0x00000000FA780000 7145 7156
!J9IndexableObject 0x00000000FA780000 {
struct J9Class* clazz = !j9arrayclass 0x4A700 // [B
Object flags = 0x00000000;
U_32 size = 0x000A77C7;
[7145] = 13, 0x0d
[7146] = 13, 0x0d
[7147] = 13, 0x0d
[7148] = 143, 0x8f
[7149] = 143, 0x8f
[7150] = 143, 0x8f <----- read
[7151] = 143, 0x8f <----- read
[7152] = 143, 0x8f <----- read
[7153] = 143, 0x8f <----- read
[7154] = 143, 0x8f <----- read
[7155] = 143, 0x8f
To print entire range: !j9indexableobject 0x00000000FA780000 0 686023
any ideas how test behaviour might be impacted by GC (collected/not collected objects, References processing involved etc.)
I'd like guidance on exploring these suggestions to understand how GC-related components impact the test behaviour. I will investigate the utilization of bytes array, with the default -Xgcpolicy:gencon
option (where the test passes) and the -Xgcpolicy:balanced
option (where the test fails) within the java code, and I will examine the -Xverbosegclog
to know when the GC is invoked, especially where invalid data seems to be written to the bytes array.
One specific for Balanced (and Metronome of course) detail is object !j9indexableobject 0x00000000FA780000 0 686023
is Discontiguous Hybrid 0xa77c7
elements total:
> !j9x 0x00000000FA780000,80,4
0xFA780000 : 0004a700 00000000 000a77c7 00000000 [ .........w...... ]
0xFA780010 : 00000000 00000000 fa980000 fa780020 [ ............ .x. ]
0xFA780020 : 00ffffff 00000000 76767600 8c767676 [ .........vvvvvv. ]
0xFA780030 : 00212121 955757d8 00dfdfdf eeaeae7a [ !!!..WW.....z... ]
0xFA780040 : ff6f6f6f cccccc40 f7f7f7f7 bef7f7f7 [ ooo.@........... ]
0xFA780050 : ff8b8b8b 2b2b2b2d 15151515 82151515 [ ....-+++........ ]
0xFA780060 : ff5c5c5c ff9b9b99 00ffffff 36363607 [ \\\..........666 ]
0xFA780070 : 3e3e3e3e 6c3e3e3e ffa5a5a5 70d0d092 [ >>>>>>>l.......p ]
0xFA980000 : 000000003732f200 ffffffffddd91106 [ ..27............ ]
0xFA980010 : ffffffffac4b9c60 ffffffffdba9111b [ `.K............. ]
0xFA980020 : ffffffffb57f9eb4 00000000603fdf79 [ ........y.?`.... ]
0xFA980030 : ffffffff98d36ae7 00000000547952f1 [ .j.......RyT.... ]
0xFA980040 : fffffffff5b14679 ffffffffd28071b3 [ yF.......q...... ]
0xFA980050 : ffffffffedc0d5e3 00000000621a2912 [ .........).b.... ]
0xFA980060 : ffffffffceaa83d3 0000000058b2e4fb [ ...........X.... ]
0xFA980070 : 000000001e51643d 000000001884cdcf [ =dQ............. ]
0xFA980080 : 000000001c213f7d 0000000051e2cd7e [ }?!.....~..Q.... ]
0xFA980090 : 000000003da39a1c ffffffffe9520a91 [ ...=......R..... ]
0xFA9800A0 : ffffffffe97105a8 ffffffffe97bbaca [ ..q.......{..... ]
...
First 0x80000
elements are stored in the leaf region starting at 0xFA980000
, reminder 0x277C7
elements are inlined to the Arraylet spine starting at 0xFA780020
.
There is no reason to suspect GC activity (core looks clean from GC perspective), however the problem might be at VM, JIT or test itself side somehow failed to handle Discontiguous Hybrid array format and assumes contiguous layout.
@singh264 Would you please try this experiment - increase region size from current 512k (0x80000) to 1m or even higher. I don't know can test use arrays larger than current 0xa77c7
. Please use -Xgc:regionSize=1048576
for 1m or even larger (size of the region must be power of 2).
If problem goes away with larger region size it is a good start point to investigate where Contiguous array is assumed wrongly.
Would you please try this experiment - increase region size from current 512k (0x80000) to 1m or even higher. I don't know can test use arrays larger than current 0xa77c7. Please use -Xgc:regionSize=1048576 for 1m or even larger (size of the region must be power of 2)
When running TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields:
./gradlew test --tests TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields -Dtests.seed=4CB1759046D1E087 -Dtests.multiplier=3 -Dtests.locale=bo-IN -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced"
-Xgc:regionSize=1048576
(1m) and -Xgcpolicy:balanced
.-Xgc:regionSize=2097152
(2m) and -Xgcpolicy:balanced
.-Xgc:regionSize=2097152
(2m) and -Xgcpolicy:metronome
.The test TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields passes with -Xjit:exclude={org/apache/lucene/util/compress/LZ4$HighCompressionHashTable*,java/lang/invoke/*}
for both -Xgcpolicy:balanced
and -Xgcpolicy:metronome
.
./gradlew test --tests TestLucene90DocValuesFormat.testSortedVariableLengthBigVsStoredFields -Dtests.seed=4CB1759046D1E087 -Dtests.multiplier=3 -Dtests.locale=bo-IN -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.jvmargs="-XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:exclude={org/apache/lucene/util/compress/LZ4$HighCompressionHashTable*,java/lang/invoke/*}"
fyi @0xdaryl @hzongaro @jdmpapin @nbhuiyan, earlier in our meeting, I had referred to this user issue; the failure is most likely related to OpenJ9 MHs, and it is resolved with OJDK MHs as per https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1883940429.
How can I know the status of OJDK MHs as the failure is most likely related to OpenJ9 MHs, and it is resolved with OJDK MHs as per https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1883940429?
How can I know the status of OJDK MHs as the failure is most likely related to OpenJ9 MHs, and it is resolved with OJDK MHs as per https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1883940429?
It seems like currently OJDK MHs cannot be enabled until the performance issues are resolved.
It seems like currently OJDK MHs cannot be enabled until the performance issues are resolved.
How can I find information about the performance issues that need to be resolved?
For JDK8 and JDK11, OJDK MH perf is being tracked through https://github.com/eclipse-openj9/openj9/issues/12728. We are trying different solutions to address the perf issues.
In the meanwhile, workarounds mentioned in the previous comments can be used to address the failure. JDK17+ can also be used to address the failure since OJDK MHs are enabled in JDK17+.
How can I know if it is currently feasible to utilize JDK17+ to address the failure since OJDK MHs are enabled in JDK17+?
Your previous test in https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1883940429 implies that the issue will be addressed with JDK17+. Nothing more needs to be done.
Hello from Apache Lucene project, where we run CI builds with various JVMs and versions, including OpenJ9.
We sometimes see CI failures that seem to be OpenJ9 specific, and we don't have many dev resources to pursue them. You can see the recent failures in Apache mail list archives. Each test failure includes a
Reproduce with:
command-line that in theory will recreate the failure in your dev area.This one looks more interesting:
I'm not sure if this is actionable by any OpenJ9 devs, but I wanted at least to establish some contact between our two projects so that together we could somehow get to the root cause on some of these issues. The root cause might be in Lucene (e.g. failing to properly detect number of bytes for an object pointer in the current JVM -- there is some JVM vendor specific logic I think) or might be something specific to OpenJ9.
We seem to be testing this OpenJ9 version:
64bit/openj9/jdk-17.0.5
.