apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.65k stars 1.03k forks source link

TestFuzzyQuery.testErrorMessage can sometimes run for a hundred seconds [LUCENE-10268] #11304

Open asfimport opened 2 years ago

asfimport commented 2 years ago

I ran 6 test runs, and it happened only in one of them:

97.93s TestFuzzyQuery.testErrorMessage (:lucene:core)

The test is badly named, but I think the intention is to provoke a determinize work limit. Funny because with the lev logic we build a DFA from the getgo... so it's just the overhead introduced by UTF32toUTF8.

Still, this seems to indicate something is wrong, or possible adversary.

To reproduce: commit hash: 63c89f678d48deb9a40b90498409ad5656d775ce "git checkout 63c89f678d48deb9a40b90498409ad5656d775ce"

57.51 second seed: 4DE463858ECD6FA6 reproduce with: ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage -Dtests.seed=4DE463858ECD6FA6

52.95 second seed: 7DE4CCA6F3A8183 reproduce with: ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage -Dtests.seed=7DE4CCA6F3A8183

48.21 second seed: 79F0E067AF13B395 reproduce with: ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage -Dtests.seed=79F0E067AF13B395

I used jdk-17: openjdk version "17.0.1" 2021-10-19 OpenJDK Runtime Environment (build 17.0.1+12) OpenJDK 64-Bit Server VM (build 17.0.1+12, mixed mode)


Migrated from LUCENE-10268 by Robert Muir (@rmuir), updated Nov 29 2021 Attachments: LUCENE-10268_test.patch

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Don't worry about the loss of seed, I'm running it 100 times and then will upload a couple of them. Maybe there is an obvious simple pattern.

I "beast" for slow seeds like this:

lucene[main]$ for iteration in {0..100}; do ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage > log$iteration.txt 2>&1; done
...
$ grep testErrorMessage log*.txt
log0.txt:   1.74s TestFuzzyQuery.testErrorMessage (:lucene:core)
log10.txt:   1.73s TestFuzzyQuery.testErrorMessage (:lucene:core)
log11.txt:   1.50s TestFuzzyQuery.testErrorMessage (:lucene:core)
log12.txt:   1.24s TestFuzzyQuery.testErrorMessage (:lucene:core)
log13.txt:   1.19s TestFuzzyQuery.testErrorMessage (:lucene:core)
log1.txt:   3.50s TestFuzzyQuery.testErrorMessage (:lucene:core)
log2.txt:   2.91s TestFuzzyQuery.testErrorMessage (:lucene:core)
log3.txt:   1.97s TestFuzzyQuery.testErrorMessage (:lucene:core)
log4.txt:   1.74s TestFuzzyQuery.testErrorMessage (:lucene:core)
log5.txt:   4.93s TestFuzzyQuery.testErrorMessage (:lucene:core)
log6.txt:   2.59s TestFuzzyQuery.testErrorMessage (:lucene:core)
log7.txt:  48.21s TestFuzzyQuery.testErrorMessage (:lucene:core)
log8.txt:   1.94s TestFuzzyQuery.testErrorMessage (:lucene:core)
log9.txt:   2.13s TestFuzzyQuery.testErrorMessage (:lucene:core)

Already got a 48second run, I'll let all 100 finish.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I ran the first seed (just tacked on -Dtests.profile=true) and it seems the issue involves sorting?

PERCENT       CPU SAMPLES   STACK
22.55%        8935          org.apache.lucene.util.automaton.Automaton$1#swapOne()
12.11%        4798          org.apache.lucene.util.automaton.Automaton$1#compare()
11.05%        4379          org.apache.lucene.util.automaton.Automaton$1#swap()
9.49%         3759          org.apache.lucene.util.automaton.Lev2TParametricDescription#transition()
5.61%         2225          org.apache.lucene.util.Sorter#reverse()
5.46%         2164          org.apache.lucene.util.automaton.Automaton#addTransition()
5.02%         1990          org.apache.lucene.util.Sorter#mergeInPlace()
4.62%         1831          org.apache.lucene.util.automaton.LevenshteinAutomata$ParametricDescription#unpack()
4.06%         1607          org.apache.lucene.util.automaton.LevenshteinAutomata#getVector()
3.70%         1467          org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()

I re-ran with -Dtests.profile=true -Dtests.profile.stacksize=20 to increase stack depth and try to get a better idea of where all this sorting is coming from. It is difficult because the sort is working recursively, so you need a big number to see what is happening.

PERCENT       CPU SAMPLES   STACK
4.36%         1753          org.apache.lucene.util.automaton.Lev2TParametricDescription#transition()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildMaxEditAutomaton()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$6()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$181+0x000000010025cd28.1457578423#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
                              at org.apache.lucene.util.TestRuleThreadAndTestName$1#evaluate()
4.17%         1675          org.apache.lucene.util.automaton.Lev2TParametricDescription#transition()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildAutomatonSet()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$7()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$185+0x00000001002632d0.1876706731#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
                              at org.apache.lucene.util.TestRuleThreadAndTestName$1#evaluate()
2.37%         953           org.apache.lucene.util.automaton.Automaton$1#compare()
                              at org.apache.lucene.util.Sorter#comparePivot()
                              at org.apache.lucene.util.Sorter#binarySort()
                              at org.apache.lucene.util.Sorter#binarySort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#sort()
                              at org.apache.lucene.util.automaton.Automaton#finishCurrentState()
                              at org.apache.lucene.util.automaton.Automaton#addTransition()
                              at org.apache.lucene.util.automaton.Automaton#addTransition()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildAutomatonSet()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$7()
2.37%         951           org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildMaxEditAutomaton()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$6()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$181+0x000000010025cd28.1457578423#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
                              at org.apache.lucene.util.TestRuleThreadAndTestName$1#evaluate()
                              at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1#evaluate()
2.35%         945           org.apache.lucene.util.automaton.Automaton$1#compare()
                              at org.apache.lucene.util.Sorter#comparePivot()
                              at org.apache.lucene.util.Sorter#binarySort()
                              at org.apache.lucene.util.Sorter#binarySort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#mergeSort()
                              at org.apache.lucene.util.InPlaceMergeSorter#sort()
                              at org.apache.lucene.util.automaton.Automaton#finishCurrentState()
                              at org.apache.lucene.util.automaton.Automaton#addTransition()
                              at org.apache.lucene.util.automaton.Automaton#addTransition()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildMaxEditAutomaton()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$6()
2.08%         837           org.apache.lucene.util.automaton.LevenshteinAutomata#getVector()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildAutomatonSet()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$7()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$185+0x00000001002632d0.1876706731#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
                              at org.apache.lucene.util.TestRuleThreadAndTestName$1#evaluate()
2.07%         831           org.apache.lucene.util.automaton.LevenshteinAutomata$ParametricDescription#unpack()
                              at org.apache.lucene.util.automaton.Lev2TParametricDescription#transition()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildAutomatonSet()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$7()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$185+0x00000001002632d0.1876706731#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
2.01%         806           org.apache.lucene.util.automaton.LevenshteinAutomata$ParametricDescription#unpack()
                              at org.apache.lucene.util.automaton.Lev2TParametricDescription#transition()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildMaxEditAutomaton()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$6()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$181+0x000000010025cd28.1457578423#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
1.97%         790           org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildAutomatonSet()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$7()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$185+0x00000001002632d0.1876706731#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
                              at org.apache.lucene.util.TestRuleThreadAndTestName$1#evaluate()
                              at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1#evaluate()
1.95%         785           org.apache.lucene.util.automaton.LevenshteinAutomata#getVector()
                              at org.apache.lucene.util.automaton.LevenshteinAutomata#toAutomaton()
                              at org.apache.lucene.search.FuzzyAutomatonBuilder#buildMaxEditAutomaton()
                              at org.apache.lucene.search.TestFuzzyQuery#lambda$testErrorMessage$6()
                              at org.apache.lucene.search.TestFuzzyQuery$$Lambda$181+0x000000010025cd28.1457578423#run()
                              at org.apache.lucene.util.LuceneTestCase#_expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.util.LuceneTestCase#expectThrows()
                              at org.apache.lucene.search.TestFuzzyQuery#testErrorMessage()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke0()
                              at jdk.internal.reflect.NativeMethodAccessorImpl#invoke()
                              at jdk.internal.reflect.DelegatingMethodAccessorImpl#invoke()
                              at java.lang.reflect.Method#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner#invoke()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$8#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$9#evaluate()
                              at com.carrotsearch.randomizedtesting.RandomizedRunner$10#evaluate()
                              at org.apache.lucene.util.TestRuleSetupTeardownChained$1#evaluate()
                              at org.apache.lucene.util.AbstractBeforeAfterRule$1#evaluate()
                              at org.apache.lucene.util.TestRuleThreadAndTestName$1#evaluate()

To me the issue seems to be the sorting called by Automaton.addTransition -> Automaton.finishCurrentState. Maybe we use the wrong sort algorithm or call finishCurrentState some unnecessary amount of times, not sure yet.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Here are the strings in question. Sorry if it crashes JIRA or your browser. Each is a string within a single unicode block (its randomRealisticUnicodeString). They are long.

First seed uses CJK ideographs: e.g. U+5066 CJK UNIFIED IDEOGRAPH-5066 Second seed uses Hangul: e.g. U+C2C0 HANGUL SYLLABLE SYI Third seed uses PUA: e.g. U+E599 <private use area-E599>

I'll try to hack the test to use randomSimpleString and see if i can get a slow seed with a hacked-version of the test that only uses ascii.

But this is what i have for now.

57.51 second seed: 4DE463858ECD6FA6 reproduce with: ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage -Dtests.seed=4DE463858ECD6FA6

偦帯荓誋橫鍳羠缅羬戻蓢喫奌瑞痶敖蒷艎瑿蟞鸐儮埵虷阈贏膋髧橧霐愥昒輲佫罈緉韜蟻册訖覞泣孨孤孲脬徯扺呺幙襧玴籺圉隉釢掣夕柲墱蹮耨圿褖衢儈渼摢淊挰褛鶍捭皐琷颛詑穭垭櫳擒檥柛嫉纜梶抢廜抋懊稦蔵蜄瞲馽冶錷秧銨殃歨咸亣袤槃阇邇執椩硞津轇苚舎梭蒎氊鵙渨鬱鍔铂飢磻筻桝蕣贙嗡礠芋蘬樏鰏亶箷畻言鑎均吱蚦殕聺鸽壄蘾橇故鰬韡邶粡臈鿱痣盾飰咅库匷直佈歌轧炙搟肻性逹譟倠艌墟蓒霅家喒躷聫刡甓熋蕆籆剶昳鍷狝谻蒝蛽迀評芃酾勒褋偪箞皇錶觖悂晫昃襢愩挔宩鮰垍踱冾赛貴瑞钍睓栟醫釜氜恗跷隔鿶垙髇网蜵觼肌嬐椷絡轎提羆灡僮疄驲睄嘕焞埦翣柑葂徹呸交涥唡跊昘瘸鮨譨傣憷濥某稁獨赫搅鄋謩冂殙絞疚诈桕簇历嚿嬠楌郣縭槯處氄绎嫬椇馘賾矡喹顰豎檪莓鄕颻奧謹駯訑鸠蒛蚍奉郜梯颱狿綇蝗樄蔚嫾絛饺拖敐袯診痕镝墛闩迈嵚鴆雏瞚瑨塭緓伝昃掛挴巸甦熾貔祦朥漌悫肗枦鶛纳闝寵耙艃捲蜓痟叮爾瑸皈凵练辳夘恫烚峄佚杌瀭翟縏氀敟暔礇淇袩銷爊翦湹崺樥沱玩趑儹茢峚酺攮瓅怫鯵愲仡昽诨嶌脡譋玙鳰盙攛蔀軵颉鴬鿊淒蝁娌皘寀稯檬鞮鋖柵礡鹁彑謭渇腥扛浵蘘熺鰀吘埜夤箖娀俍蟺霴匃燨瀓瓾侑続萩嗂墲绯衝峛汚淵术瀉溅跦釬崇鞡喱剞勗怈韾撆坛镝粰妉嶛穦鏨噖蟫顸叹紡龼筆暒緯楑萦磚暟鰉梋邔撎欐餄霙绬伖輹掅篰裯母稪毣衾濒聮狭侄饧謂欷燚瑟膴呩垦螚鰟撰薾镍馇莊鞰瀢吊该罘齬杁鵂煖毺緮儠鏴袷瘩舃缤嘄揳韭肎仛却裾頜谟蠏觋聺袥捻沎漰旃氡髋贂榘篌椪制駶槒醶鴽蕎鳺觋讀偊吙絴欯鳯唭较抖都笮嚴塣硲淙榓箖庬睫庥夆筌矕泓瀞殬馯韽岫磲叽塼桇悃櫆唔蒢诒絰変恈佊雈蟏轺扟棣兀懛犳敦筟葐呿玓袧嬜鏘礮匁魱嚉颩茇鮃踿怢濣唿輾壵瑍薜曌龐亥嶽眯熖恟搅簖齮秹笾串蛠礯浰楑柀鋏洁廿肱讐捿貫鯷匋豜閼藍躬袙衠跜姻烯源庄賵饺滆篂苉筮屫迊燭県吐蛨泾堟烘軖窔掟嚞祫紙瀱镁宾母瘁犰摰瞂囍瑰孠寏缿綏獅侯棡钹鋘煛晬蠪鮴箂狸禤糋姸杼杝繵墓萹髊极鮞坞陰獘鎙畁忑燯傭蓅湪已鳤矕鈸菎偤骴畟拸閲蓾喘猨姃畹缯僙陋孜肠弘誤貟儇墬惦哼搗勶繌偫汸衯攋褐鋉鬯靗溵舒腫营纭揑揠錊犋凈掎禳滖鮎焉冇柡酛釫夅勴禺阙鑑纰験靊譿佭慗戲諔赓碻蛺驮袙凑践緗祳陿觛髧詪髙蝘踌虩彊蜜皖嗋僶蜲嵅宼寝豇媨妧狆涨羱凌慲謏裏讞糱瀾镍釟瀻馈矬擊櫴蘕饕岃癐膍拎俻鸁繨擫垈愡俜恽汩羯啉檚檃鸉钶珣弙諤噹巃臌话脭餋位鷊鐇烏吺桞瀚偀醯嚀踪痖架攼埬摫铍煲瀂芃鐖噶癣佫晱蓨鑇肄細髓蠇愞谦渲傛攛壽砟鮽靧挲腁妆昂遞嶕谲昗獡唁摖偖俏某覶飆擴亪耿人赩岈疦縮甔膌娣噣鴁湀靃罜鰝廚夤崼騃邂读缻矬殄蘠鋕汄炂虚绥搌騴歁框鞘螏丹漼抭让悙鳋湥輤铃黈尉肊亰錽舙畖魸鷿吗阭仕绶楩瑲駣卌槝譾羦莥炄抋昭睜硝闫儾富竞詀鋔鐱蜉茳嵡嶑閬闻鞩悼蝓鮘鋕瞠娬貵鑢貯鼕罾簐钸敄湷腫源嫯栶婥磇皩樔溞鉤膐莡匰鞒鞅厳硫旮沁獗禍筆謚愢棹螻嫡磝筰圖锈堬毑蠙术蘅际餜鯔艵佪肛惤只獞娼藅勔啓淾轎礷遨酤洠鹖偏曪熀爲窡漣芭打扡学虲鞼颰虛凖舸钹糌辜魏銺垈瓴製綁早凶窢扦竵办较瀄皳眅鵂頞巛幖婜罕釺胍貌谵肑蒽瑕篕彀豗疾鿍匊笆暣蔜萡熂唜蜸俆窽薞榖僱桥薃頙孭姻卜跊縡毂縗齑咫鶷庾獀覹俑糉摾忔配鴃臯蠞揷韷萪尒侳唲隄淴敭禊陿儂浽蝼額媠莿俩蔍岁遒詿用麰糯玿夝迆驑欀穽辧存濡嫲蔲詛業仯彥龃曃焪亶獇蕗漄曽稕颾顁韲禐扺詂笇聛鉶糘暍贵桴妉腆莕鲶仚鹋鎐筴缧繕朴蕌彡詋肼爝蠛哰带緘佰浂覬噦瀘辯蛒肳饸韐獊幠饉釋莚伔懈囈启霝蚳淚紖辆傖栶翶残啦浴鬎妆軲笿鯯袄峒濈俵祔瞀鷤橲桓鹟竆懲龛窅锩喕宍暾篐埪漋噊硛全壦羧揁慂膙洸滯臥穲幧救詜懅夲捼爹乙唿毅导顣羶礳恸爎壴查宫煸烑靧榫攼俶涡清鏃乚諾叭徘鰜舠康裥鲎頕慼郋窓渟闣瑤本蕈歵姍狝茠栠讥敗臍忆噛縇芏愧諡垚邽媂捓突愁慣靷芮踽螒鉗殜餟臱晏霶閘憁糴蚯褋樆揦篕礚篔澦蛁怺焸唱琐磷抋砼裝颔玖冊駨靄籅构渆葽幅陲艸顨傖誩噜娹軵祿鏿盟咸姬樜颽栩鞬謴仄萙牱砬峌謵燱琿掲餓艟魇叛斉奂貗蚢匣醻謍駝旄啜砢塞韝稯鹜旷稪珴袄蔬夷衮卦貴稔旆迻雬臚狳蒒槮莾梪莕葞閽擵凓榉鑑庋饡牖拹庖镉略楲蔶裟儤絃扥鋴層昰閷稵胡霂莵笲钦塶举巙謙辧祮鲑豵禪浺遠憅蜑鍃团壴髩劷蘗达尌檊峬踯汅舌笵琼緽蓇譴嫶擹扚贑嶰斔昽鋆咦蒺苃鉉瑡頀崺搨皌綫髲襸鉯雥捶绱皹陠觺潟凰嫊饷衂亯媛粄钉狣齞縟忏璀瓄塋鯉安榪皢魇癌骯廗璇化寬蓈筂麁宭兔絾勿閞弦篔閄麕眸缞崞怹鞣鹚颟蓺屌鞹冲饍蘯澿櫰陦灄耈覚輱懏雮茭溓溾瑴七璨賟茱媅烬筫譓候鳸尧鐰鱋巇绞犻蒫恧賣岬彝箨妹絿箯僛縆蠾魲嬀礥篇勱楝暀魃痬妛礈疡枪赏熪肣旨闊琣笖逄嵉麮笖荒紘禼丛呭嬅瓎忖餭騸鐬鳋屾盦蔣幖鏲竲餯轭涣冣秏韬跼妚饹艹禜郭搨焿鎠槊芸朱壱水覶刨勃寱闱蘋饴瑗獨頨撯昽锹忙柤鿉阽瓆煅晧娫鵐勍魖憙朴艪醖丗囹謂蛷烓笻晇喑懚华坖歋猆旄悊猛骱玝滩夠掄厦錤栂嫠辊灸诱报裈嚐砜轝邷盄沨挖訓哂毆粧柽鞚懇訍痚扤焺鲍櫡刞琈裋鏝钼若瀼鱪膗次帢渥縧梡僪蜯凋酾姫鞯褖抄巵庖鵠裛翈魸舎食妩剣翻翍冧玣歪垷鹐誈鞏踂灲璡臩硚舉嚁溂坪匔欣榭俸嫽挱庺銗蚋牖撹詺煕监鞫滤咛賓貮鈊踞廜硘唻萝淁壓獟镔誅柞豼垿閠鋒斱窆髈鰨璳縀嚏裓垛义訨雁浐魌刍淙緫玭侱磷溋嗈覭蛢鿬図擦粮疏睊藿皙缄槺稁頬宧虫铋灀圽慌鐷狟垽貛甲闻鐋牤乊溒慺賱暸秓橆晅汸礆烟靘竜娋栉楖棬沍柝妩霳歝颎靋窃嶒氕昴把泰鍪逆

52.95 second seed: 7DE4CCA6F3A8183 reproduce with: ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage -Dtests.seed=7DE4CCA6F3A8183

싀쯢샢췂믝쬊쮐셧퀕볧쳬쁍첕삷덩앀휙뾔휇텞쇳겵흻릨륂븮쏟좭첇낸힕뾸뀶쎿턯냴헋냒죨뚐쩥뻘뤆뗳핺뭿눂콜챱풐폙범벟틒퇲덚얝왯감뛁꾤븀뼑떵콇낮흁뛿땅퓷삹뚠혵훻뚼콊즗궖쎇흃퀈갎붰룧춦웉즽삉퇾팈촪랾쪦횘뗀블댖룔빜빽뤏뭅됺쀵짇똂콘뗦믰쇤춮삲팺딴뎂쩄꽽쬖왆쑅췪싱븬뷷쪯춹퇙뎡걥쵝퉽숆뢾볍섔셗꽟읧얠흇닙툥둌듴걃씿뉛굮썭귞뫲쌲쓓뾞댺눞씱끉쒫쐺칤럼댄폻쀦쵥쯹픜갦뼘듰솩뛅뷜꽡욐춗쓇쀱뮑뉆뢯놿뜚쀚늴벙췺솗똷꼈솱쌌렣귉밗죗좂쑵쓪뵘쏆튐뿠알폆벇성쏙횺쳏뢄챣릉첂죊듫촲쭶쫵쏒뜯꼛똺띓먤붪꺜뻩떶벒뽼꼄깒뛔줥뵎뿦쯆틤롨쿀꾰똬퍝딭템육뫰쳇썳쑟뜫넔섬쵉빑꾓폋갂뮥욎줖퀨윸긑쪴찐칶쨹띌탎띫눅뷹씢얖헡핦짴퀎꼮뒿쇩엜퇷몹붺봷갩쩄탍춂틇볺틃얿뒮삙팸벀렭괳뵱삗잵롳쬀춿헚톧씽첶빓틜쿯꿑쫻늏쫰똄뜱뢐끂엸쾄꽇긳뙣끐츾뗽냻풢뼟옛멋뽿붣샼뵕혶퇷쓖톁듃웽뛽넋띳농퇹얲꾴쉱뉻뱃윷넩챃늂좡떓뉆괒쥘뛽됞홮쪉아톨귮춇븘톺즢롶걊뷒곟렓윑숇섞쫱떜쳿빛쉼쭍늟갹짙돋꾍롢쀹븛뿫훾똼겉뼔램츆싅탏훴끊끄픈훖봗읊뚺콱쿍껿촜롿쾒쒇퐑벋쎐쵍짢륊텻먔쐗쳘틀뗖곔컄륖접눛룼쉉뷞겭풣쑥쭖쓵밼뭯히갗첣떀삏튅톡섰혧룸쩱쎕뙺홦쓽쇜쁸극갵켕뛦틎늀콃삶곜홬켎흐띘쫄뭌쉹꿨놏맳맥쯵밳민쓿싦숊썘퓊펇롫쿏뾑잢븳꺎놈쩖렪륾츛씮뇊줭뫎챀혹씥쬱젫찉곃톒솎졔뫆샕끅톈탗쪰짙춸깂떛뉐둟붓뙮쿚왂맸쪇첦잌엹뫎럚쿟훧츢탋웳숆훇깣쀚빥꺥벍훰뜎튭힌닩뢨윹찞하흸믿받졓뉛퇰육효똗왁퍔몋덗뽟퇛헣쵋뷨맯흂싺쨑괨땶띦줝뚥볶쌟둆쯍껰늗셙뫘쇴냦돺긔쥥풍뀰딳껂갌껿퍱릋흱롡츃옴괊뷧퐔폠쪿민햟궷쁬륕샨솺촑쪶끻쮘쎬껻뚕균훘쬒픀뜲좓뭖띕퀠흟쑣걭쿉뵽쿱먨뭼꿠왊읋삭찦뒺댞혭펖쳖헢곋쀰띊륤몫퐼젳빉쇽냥캒냐콭븜뗶슙짤쓐샮퀣쟾뷠더렉휙벋춢됢턤뤐죫썝췓뤭땶쌒괓뱡뾎갇떪늝둜쟷릭늡륒씷잱픾띟휯홯픃봅웝끭손믬쬏훪훼촀즨쵺돱텏뺇침싛몭뼦멑큃뒿꽄뢻풯힀셡뼦꾁껒푮댐벋킗퓍뺋뀷롶섙롋톇졗쪬쬡씬곣뼘쬯잽뫂쥂몟횼옊짬넞턽쓏뜕뒊썏줁곝젊턼숏뚪됺퓡쥮놿력퀉춇뀕싺쬛걄섢턁짹뢉먮눮뚫솽톸괻띟볽첇솚녷쒨귮쓊먺밤쿍롉먩쟢삓욊풾돞낍꼙쎅횎먒딜괟퉇뎣쯭캸딌싅퇠냕믗쓧뤁뒧독핂쵉췻왽힏뫁맭숣툜꾞큖꺝겲갆슜뉜쮰븯귅퀥짶꼆뙤녪쳍쁞쯴쯌킪랅쓆줩츃퐶뜭텰웞갳퍜톤얁뤹켛쳩픽걡윳줴뀔념쳴뇗큥맒귫쉢쭜녺롋뱜촗몌뗄퇥딄뜱봛귫씒떍긔냓쟫쥰욳뙹믈묀뉫믧굕싣됃쑥쁠쒡띔풅뿚섪햃쒰쵤곋풘붣쟳첔껩늂븙틗싁읕돔웒흊돒얶묹퀍틩쮇짍몴탈텂쟽쥉찕폘븍컺퍚깻댴젅쥝들꽎쀫는껿쳳쓭뎆뻉닙먥힤퓥썍붰킑쳵풴왓큯챖춆뱺홞퍹똅찯써쫣믘빬싐뫻됆짟쒒녘싯놇웆녤뚬딋괫롦늰갖뭒벘봧눮꾝갬굥뒼캋눁쌋휪휳콁먺괄쳨봀봼혎폇돉펍쐒랖떷텏잽뉏뭮쑨뺬뮦꿪곗께쌪늦놎붴잤톐뒫햫꺎켓되춟뺤쐯춒쩙팀픝훎큲틔녈풄뛛덉썭쭻댮낌궧뇧쑞쿮몽뾾둘텹풏짟촚샭렾뛴쮾혪멗쩟웋몭뺃긊먐앯싲섡겤몤욼왓햽묡쿰좘뛎뼺텢뷕뽔덿먆넺뙓뭧폰퐍갞팙퓇뱐곫뾄쾭틱둺췓켆궛부힬햘쳇단꽆뾉칶퍚럔텶앰홡번죦쥆뜴쮺깨룋곉윟잼괋쀀띈뫦휲핸겼업튙츔췭쓩팥탎뻓럕썏쪅뻴콂젓뽎겾쬮빲뒄펿틞퐴밐잷뱮푈곐쳦갧됶겡롖컽덫핻릃홷깐츚뭾걭즡됨쬧정겤뚪쇄뗂뛀킫쎮좏믚훵쐇웺즬믋꼗웕쉚띪뿱쥋닀퍲테웸꽪뜈쌋홽낎쮸뱏쥏띫홪뺋뮛턏닔쉹퍨멝멱쳐랻햑쟘햜낡뱈쑜혯뤦뻿섁쁏퍻빯딑앷꽉뎸굟큽꿕륈껞쭳톔뗧쏸쓁돍턻퉟쬷펪겯쒴볋켫왏폺홰꺐춪쵕렵캑챵뷏홎폙묊쥍줃췕전쁕겷텓풶쬈삐컚댞퇍픥뮕뤞뫭볼찫줕볔쾾놰뜨힙쀠덷븪솶딐뵕먀큡눱뉓덵쨊먰눽쌩턴흿켳웞탞깥펱솶슛혢뇦쥉굢니닮쑠껽묘댚읷쾃뛙뤻켤럃꾷쓤뚪픍낳쒕쬖특턧칵뛱떱뽘굔쟍껟턼꺟륆난콰꽰굎쵒뫧홿쟒탈퉘쫨툴뱷옌뀴숙캫쵟촤멅뮓슞뇴랾딕윴몣딥쓟팢챗싟뫗츇뜘홟닊밚퉊맾쾩쐠뻔졢쒪퀐닔훁변덥픴쳶뵖롷샡춉쎭닷킇넒떙츤뿃뗣컠켳깳땶탒퇕땣뫱괆즪웙륐봧둺쨿뷇춨툛꺶뗦윣턕흨듰딂듣컌륖읙툡먃쟤쬪퀉띀껣녒빁푸횁럿쪈요큋갔꿐놔녪쓿쒲쳢쓯쫋짢셗휨셞쐿꽏짇쇄걣틳궣뱞뗲섪뵴좺릣됝찂사컇큋뫣쒨몴븱밸먙푨얠뗢컔픥땚뎰묎곞쀺쳓긧좶쀾꿁뭐얹띄숇즤굒뀜잰쎒콩붛퀃푄첤줍탛뛭졪쏲퐱뽲혢힏뵜뭰떪빻잱효팠볝럮챁믝롬쩏뛇녨턌뢰죚윌챬큜넻먢싟툶쮼뫣밷쯴뒬꼠팪죺쾗딄뾿쪋땳촅궭쟦폆퍧벌꾽쯄튶팲쒬퀚썶둀왒뫞냡굆둒눦흹뵖넄꼠웣뱴펒갻쪒쐛볿빬쾧뉓쉌뚧횟웊뗯팭뎮펅밤텱엗콀킡좐끮택꺑궱라씍띛쭻냟팬럴샓처곿빉쥘뱏덣꿭갪꾲쥑댒줧뼪긎슾욻쓈뎭탓솣괄낑퉍멩괽멋쪫밮술풆잢쌝쾩륱넒의쑐촨꼓뻒쳎젼츧핳쳩쌢쨋렲뮕뉺퉈뱘굏싎앣숌녁솜릴쀨몠넸괏훷뇼췗굗뛜핳퉊쿓쎗낳쮵뷂겮묤꿉씧츣뻮뭩쨾툙튅숄픓틡탠샥밷햑뷚쥜꼛뼁힪쪛쉄옳컐쇆먪븣퉠콵귞떼릅뾲꾼곛짡햵쁢냯쇯쎸왹풟톦칎쉳박럖쎢놟럩쳎솻캶쪭랅귎깚솕둀샢숽쯎겺츇녨뾁좢뻧뎼삢먒뉫돘묢뿣쫱뭢뮾믚뇓즰달깱곕돩퍏횔땇녙죦잌웓갈꼕퓋쀪믧흛뺔쟑딐꺇럧첳귓럷힒쀶반뇛좸맬눾횎랊쓄찃쉗힫쾶틀쾁몰낇햢퍆똘겐쵑거퍾뢜갼혂걤뷭껙럜퉕앍춫뀷듂밁욃뗑홎염녿킗캷뼟취낝쐝쭖쨖돬덼뢗샬냦껁쳾맺썯푥쒐퐭믫봦쌆뱖괮꽤풞쏾볪쎷뫲먹닪뱘았뀚짎킃깚줮똥즼꾰갾켕퉻쀍괍튅륢뀚펡뵈횲캷뚣뭤쓳풂뽑쿀헸몜헾읞묢쯴졉젖컼젱즡뢽졅쫉쁛샮릦봎케펉믄힝룡펧읇펺랩헗돂뵤댒듰읠폂쌀뛐뉥촚뤤쿩꼹낅뀙돡뾬덥돩왍

48.21 second seed: 79F0E067AF13B395 reproduce with: ./gradlew -p lucene/core test --tests TestFuzzyQuery.testErrorMessage -Dtests.seed=79F0E067AF13B395


asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

So, if I patch the test to just use ASCII letters, it just gets insanely slow (may never finish) on the very first test. Seems the smaller alphabet helps provoke more of the sorting?

diff --git a/lucene/core/src/test/org/apache/lucene/search/TestFuzzyQuery.java b/lucene/core/src/test/org/apache/lucene/search/TestFuzzyQuery.java
index 7be05292bce..2de2954d40e 100644
--- a/lucene/core/src/test/org/apache/lucene/search/TestFuzzyQuery.java
+++ b/lucene/core/src/test/org/apache/lucene/search/TestFuzzyQuery.java
`@@` -569,7 +569,7 `@@` public class TestFuzzyQuery extends LuceneTestCase {
   private String randomRealisticMultiByteUnicode(int length) {
     while (true) {
       // There is 1 single-byte unicode block, and 194 multi-byte blocks
-      String value = RandomizedTest.randomRealisticUnicodeOfCodepointLength(length);
+      String value = RandomizedTest.randomAsciiLettersOfLength(length);
       if (value.charAt(0) > Byte.MAX_VALUE) {
         return value;
       }

I'll see if i can just produce a standalone ascii-letter-only unit test for TestLevenshteinAutomata that reproduces the problem. I really don't think determinization is the problem in this case, it seems to happen during initial DFA construction. This way we can have something easier to work with.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

oh, duh, it was that previous insane loop comparing characters and bytes. We should just use randomRealisticUnicodeString or something here. its fine if we test the ascii case too. seems crazy to explicitly exclude it.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

So for the first seed, here's the state/transition count: "states=90495,transitions=477905" Automaton is quite huge. I've attached a standalone test that runs for 28 seconds just in initial construction.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Use curl to download the patch to reproduce, because your browser's charset detection will screw it up.

curl -O https://issues.apache.org/jira/secure/attachment/13036712/LUCENE-10268_test.patch

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

OK, I've got a better understanding of this. The strings are huge (e.g. 2K length). In these cases the alphabet used in the strings is also huge (e.g. first seed is 1926).

When building this automaton, we have to do a ton of wasteful sorting because of high number of transitions per state.

It is possible to modify the algorithm to just add the transitions in order in the first place, to remove this overhead.

But this is difficult to optimize because of Automaton.finishCurrentState sorts twice in two different orders:

So we can't just simply give the data in sorted order to begin with, and rely upon sorting algorithms to be fast, because its gonna sort two different ways anyway.

Furthermore, this transition-sorting works lazily (state N is only "finished" after you add N+1 or call finishState() at the end).

So the way the underlying representation works makes it challenging to figure out the way to bypass the abstractions... I will think about it.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

an evil way to bypass all this overhead might be something like a setSortedTransitions() for cases like this where you want to build optimum DFA, that throws an exception if:

The idea would be to simply check/assert the stuff with a fast O(n) pass, throwing exceptions if the API is violated, but avoiding all the sorts.