as part of https://github.com/apache/lucene/pull/12688,
I'm trying to develop an algorithm that can intersect FST and FSA efficiently. For FSA this means we can leverage the sorted transitions of a given state and perform binary search in order to advance to a target.
So the algorithm uses TransitionAccessor and there are three relevant APIs
int initTransition(int state, Transition t);
/** Iterate to the next transition after the provided one */
void getNextTransition(Transition t);
/**
* Fill the provided {@link Transition} with the index'th transition leaving the specified state.
*/
void getTransition(int state, int index, Transition t);
One could call initTransition followed by many getNextTransition to iterate the transitions one by one. Or in my case, Binary search needs to use getTransition to randomly access the middle point.
I debugged for hours and realized for a given test case my algo works as expected but the transitions I got were messed up. The original tests uses quite complicated automatons so I tried to find a small and simple enough to exhibit the behavior. Here is the test
public void testAutomaton() {
RegExp regExp = new RegExp("+*.", RegExp.NONE);
Automaton a = regExp.toAutomaton();
CompiledAutomaton compiledAutomaton = new CompiledAutomaton(a);
System.out.println("isFinite: " + compiledAutomaton.finite);
var byteRunnable = compiledAutomaton.getByteRunnable();
var transitionAccessor = compiledAutomaton.getTransitionAccessor();
// dfsAutomaton(byteRunnable, transitionAccessor, 0, "");
// dumpTransitionsViaNext(byteRunnable, transitionAccessor, 0, new HashSet<>());
dumpTransitionsViaRA(byteRunnable, transitionAccessor, 0, new HashSet<>());
}
void dumpTransitionsViaNext(ByteRunnable a, TransitionAccessor transitionAccessor, int currentState, Set<Integer> seenStates) {
if (seenStates.contains(currentState)) {
return;
}
seenStates.add(currentState);
var t = new Transition();
var numStates = transitionAccessor.initTransition(currentState, t);
for (int i = 0; i < numStates; i++) {
transitionAccessor.getNextTransition(t);
System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max + "] "
+ "dest: " + t.dest + " is dest accept: "
+ (a.isAccept(t.dest) ? "yes" : "no"));
dumpTransitionsViaNext(a, transitionAccessor, t.dest, seenStates);
}
}
void dumpTransitionsViaRA(ByteRunnable a, TransitionAccessor transitionAccessor, int currentState, Set<Integer> seenStates) {
if (seenStates.contains(currentState)) {
return;
}
seenStates.add(currentState);
var t = new Transition();
var numStates = transitionAccessor.initTransition(currentState, t);
// transitionAccessor.getTransition(currentState, numStates - 1, t);
for (int i = 0; i < numStates; i++) {
transitionAccessor.getTransition(currentState, i, t);
System.out.println("At: src: " + t.source + " [" + t.min +", " + t.max + "] "
+ "dest: " + t.dest + " is dest accept: "
+ (a.isAccept(t.dest) ? "yes" : "no"));
dumpTransitionsViaRA(a, transitionAccessor, t.dest, seenStates);
}
}
dumpTransitionsViaNext gives expected transitions but dumpTransitionsViaRA produces a mess.
Via next
At: src: 0 arcIdx: 0 [0, 42] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 1 [43, 43] dest: 2 is dest accept: yes
At: src: 2 arcIdx: 0 [0, 42] dest: 1 is dest accept: yes
At: src: 2 arcIdx: 1 [43, 43] dest: 2 is dest accept: yes
At: src: 2 arcIdx: 2 [44, 127] dest: 1 is dest accept: yes
At: src: 2 arcIdx: 3 [194, 223] dest: 7 is dest accept: no
At: src: 7 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 7 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 7 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 7 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 2 arcIdx: 4 [224, 239] dest: 8 is dest accept: no
At: src: 8 arcIdx: 0 [128, 142] dest: 11 is dest accept: no
At: src: 11 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 11 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 11 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 11 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 8 arcIdx: 1 [143, 143] dest: 11 is dest accept: no
At: src: 8 arcIdx: 2 [144, 190] dest: 11 is dest accept: no
At: src: 8 arcIdx: 3 [191, 191] dest: 11 is dest accept: no
At: src: 2 arcIdx: 5 [240, 243] dest: 9 is dest accept: no
At: src: 9 arcIdx: 0 [128, 142] dest: 12 is dest accept: no
At: src: 12 arcIdx: 0 [128, 142] dest: 13 is dest accept: no
At: src: 13 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 13 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 13 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 13 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 12 arcIdx: 1 [143, 143] dest: 13 is dest accept: no
At: src: 12 arcIdx: 2 [144, 190] dest: 13 is dest accept: no
At: src: 12 arcIdx: 3 [191, 191] dest: 13 is dest accept: no
At: src: 9 arcIdx: 1 [143, 143] dest: 12 is dest accept: no
At: src: 9 arcIdx: 2 [144, 190] dest: 12 is dest accept: no
At: src: 9 arcIdx: 3 [191, 191] dest: 12 is dest accept: no
At: src: 2 arcIdx: 6 [244, 244] dest: 10 is dest accept: no
At: src: 10 arcIdx: 0 [128, 142] dest: 14 is dest accept: no
At: src: 14 arcIdx: 0 [128, 142] dest: 16 is dest accept: no
At: src: 16 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 16 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 16 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 16 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 14 arcIdx: 1 [143, 143] dest: 16 is dest accept: no
At: src: 14 arcIdx: 2 [144, 190] dest: 16 is dest accept: no
At: src: 14 arcIdx: 3 [191, 191] dest: 16 is dest accept: no
At: src: 10 arcIdx: 1 [143, 143] dest: 15 is dest accept: no
At: src: 15 arcIdx: 0 [128, 142] dest: 17 is dest accept: no
At: src: 17 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 17 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 17 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 17 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 15 arcIdx: 1 [143, 143] dest: 17 is dest accept: no
At: src: 15 arcIdx: 2 [144, 190] dest: 17 is dest accept: no
At: src: 15 arcIdx: 3 [191, 191] dest: 18 is dest accept: no
At: src: 18 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 18 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 18 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 18 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 2 [44, 127] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 3 [194, 223] dest: 3 is dest accept: no
At: src: 3 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 3 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 3 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 3 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 0 arcIdx: 4 [224, 239] dest: 4 is dest accept: no
At: src: 4 arcIdx: 0 [128, 142] dest: 19 is dest accept: no
At: src: 19 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 19 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 19 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 19 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 4 arcIdx: 1 [143, 143] dest: 19 is dest accept: no
At: src: 4 arcIdx: 2 [144, 190] dest: 19 is dest accept: no
At: src: 4 arcIdx: 3 [191, 191] dest: 19 is dest accept: no
At: src: 0 arcIdx: 5 [240, 243] dest: 5 is dest accept: no
At: src: 5 arcIdx: 0 [128, 142] dest: 20 is dest accept: no
At: src: 20 arcIdx: 0 [128, 142] dest: 21 is dest accept: no
At: src: 21 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 21 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 21 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 21 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 20 arcIdx: 1 [143, 143] dest: 21 is dest accept: no
At: src: 20 arcIdx: 2 [144, 190] dest: 21 is dest accept: no
At: src: 20 arcIdx: 3 [191, 191] dest: 21 is dest accept: no
At: src: 5 arcIdx: 1 [143, 143] dest: 20 is dest accept: no
At: src: 5 arcIdx: 2 [144, 190] dest: 20 is dest accept: no
At: src: 5 arcIdx: 3 [191, 191] dest: 20 is dest accept: no
At: src: 0 arcIdx: 6 [244, 244] dest: 6 is dest accept: no
At: src: 6 arcIdx: 0 [128, 142] dest: 22 is dest accept: no
At: src: 22 arcIdx: 0 [128, 142] dest: 24 is dest accept: no
At: src: 24 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 24 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 24 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 24 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 22 arcIdx: 1 [143, 143] dest: 24 is dest accept: no
At: src: 22 arcIdx: 2 [144, 190] dest: 24 is dest accept: no
At: src: 22 arcIdx: 3 [191, 191] dest: 24 is dest accept: no
At: src: 6 arcIdx: 1 [143, 143] dest: 23 is dest accept: no
At: src: 23 arcIdx: 0 [128, 142] dest: 25 is dest accept: no
At: src: 25 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 25 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 25 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 25 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
At: src: 23 arcIdx: 1 [143, 143] dest: 25 is dest accept: no
At: src: 23 arcIdx: 2 [144, 190] dest: 25 is dest accept: no
At: src: 23 arcIdx: 3 [191, 191] dest: 26 is dest accept: no
At: src: 26 arcIdx: 0 [128, 142] dest: 1 is dest accept: yes
At: src: 26 arcIdx: 1 [143, 143] dest: 1 is dest accept: yes
At: src: 26 arcIdx: 2 [144, 190] dest: 1 is dest accept: yes
At: src: 26 arcIdx: 3 [191, 191] dest: 1 is dest accept: yes
Via RA
isFinite: false
At: src: 0 arcIdx: 0 [0, 42] dest: 0 is dest accept: no
At: src: 0 arcIdx: 1 [43, 43] dest: 0 is dest accept: no
At: src: 0 arcIdx: 2 [44, 127] dest: 0 is dest accept: no
At: src: 0 arcIdx: 3 [194, 223] dest: 0 is dest accept: no
At: src: 0 arcIdx: 4 [224, 239] dest: 0 is dest accept: no
At: src: 0 arcIdx: 5 [240, 243] dest: 0 is dest accept: no
At: src: 0 arcIdx: 6 [244, 244] dest: 0 is dest accept: no
I shared this with @zhaih and he seemed to have an idea of the fix.
Description
as part of https://github.com/apache/lucene/pull/12688,
I'm trying to develop an algorithm that can intersect FST and FSA efficiently. For FSA this means we can leverage the sorted transitions of a given state and perform binary search in order to advance to a target.
So the algorithm uses
TransitionAccessor
and there are three relevant APIsOne could call
initTransition
followed by manygetNextTransition
to iterate the transitions one by one. Or in my case, Binary search needs to usegetTransition
to randomly access the middle point.I debugged for hours and realized for a given test case my algo works as expected but the transitions I got were messed up. The original tests uses quite complicated automatons so I tried to find a small and simple enough to exhibit the behavior. Here is the test
dumpTransitionsViaNext
gives expected transitions butdumpTransitionsViaRA
produces a mess.I shared this with @zhaih and he seemed to have an idea of the fix.
Version and environment details
No response