ColinEberhardt / assemblyscript-regex

A regex engine for AssemblyScript
MIT License
86 stars 12 forks source link

Add NFA -> DFA transformation + optimizations in this domain #8

Open MaxGraey opened 3 years ago

MaxGraey commented 3 years ago

Great article about this: https://jasonhpriestley.com/regex-dfa

ColinEberhardt commented 3 years ago

That's a really interesting article - thanks.

ColinEberhardt commented 3 years ago

Just FYI - I've been exploring a multi-state NFA algorithm, which is conceptually similar to a DFA (it is in some senses an emulated DFA):

https://github.com/ColinEberhardt/assemblyscript-regex/tree/multi-state-NFA

However, it doesn't appear to be any faster in my benchmark tests!

MaxGraey commented 3 years ago

As I understand NFA -> DFA not helps in some scenarios.

ColinEberhardt commented 3 years ago

I believe the principal advantage is that DFAs do not result in back-tracking. However, many simple regex's do not result in any significant backtracking. This is why I'm keen to add some more complex expressions to the benchmark test!

MaxGraey commented 3 years ago

Also It seems this part could be a more effecient:

function addNextState(
  state: State,
  nextStates: State[],
  visited: State[]
): void {
  if (state.epsilonTransitions.length > 0) {
    for (let i = 0; i < state.epsilonTransitions.length; i++) {
      const st = state.epsilonTransitions[i];
      if (!visited.includes(st)) {
        visited.push(st);
        addNextState(st, nextStates, visited);
      }
    }
  } else {
    nextStates.push(state);
  }
}

with avoiding recursion and use stack / queue instead. Also visited better replace to Set instead array

MaxGraey commented 3 years ago

https://v8.dev/blog/non-backtracking-regexp