Open asfimport opened 2 years ago
Uwe Schindler (@uschindler) (migrated from JIRA)
Reproduce line: gradlew :lucene:analysis:integration.tests:test --tests "org.apache.lucene.analysis.tests.TestRandomChains.testRandomChainsWithLargeStrings" -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=E4552C7844FC2DA3 -Ptests.file.encoding=US-ASCII
Robert Muir (@rmuir) (migrated from JIRA)
This one may be KoreanNumberFilter in the chain causing the issue. I reproduced a failure without KoreanTokenizer (just KoreanNumberFilter). If you look at it the filter, it tries to change offsets.
Tomoko Uchida (@mocobeta) (migrated from JIRA)
I noticed this issue when looking at the KoreanTokenizer. We are aggressively refactoring KoreanTokenizer in #11429 and #11529, I'd like to enable it on TestRandomChains
as with JapaneseTokenizer. According to the stacktrace, the problem is KoreanNumbefFilter, not KoreanTokenizer...
To give it a try, can I remove the class-level @IgrnoreRandomChains
from it?
Uwe Schindler (@uschindler) (migrated from JIRA)
Yes. The annotation can be placed on constrictor or class. When you remove it, it is included. But as it is randomized you may need to run it hundreds of times. Use "gradlew beast" for a loop with different randomization hashes.
Uwe Schindler (@uschindler) (migrated from JIRA)
I suspect that the reproduce line no longer works, sorry. Trying in a loop is better.
Tomoko Uchida (@mocobeta) (migrated from JIRA)
Thanks, let me take a look.
It looks like KoreanTokenizer is causing this (NORI), but Kuromoji may be affected in the same way:
Migrated from LUCENE-10359 by Uwe Schindler (@uschindler), updated Apr 09 2022