BioJulia / Automa.jl

A julia code generator for regular expressions
Other
188 stars 15 forks source link

Disallow DFAs with ambiguous actions [RFC] #49

Closed jakobnissen closed 3 years ago

jakobnissen commented 4 years ago

See issue #48 . With this PR, attempting to create a DFA with ambiguous actions will result in an error.

Before:

machine = let
   A = re"XY"
   B = re"XZ"
   A.actions[:enter] = [:entering_A]
   Automa.compile(A | B)
end

Here, the machine, when seeing an X, won't know if it has to execute :entering_A or not. Previously, all possible actions were taken, even mututally exclusive ones, which could result in unexpected and self-contradicting behaviour, like exiting the same regex multiple times, or entering a regex and never exiting it (even at EOF). With this commit, the above code raises an error: ERROR: Ambiguous NFA: Byte 0x58 can lead to actions nothing or [:entering_A]

Note that you can still have ambiguous DFAs if the actions are not ambiguous, for example this code will still work:

machine = let
   A = re"XY"
   B = re"XZ"
   A.actions[:exit] = [:exiting_A]
   Automa.compile(A | B)
end

While the machine don't know if it is entering A or B, that doesn't matter because they lead to the same actions. And when exiting A, the machine knows for certain that it's not exiting B.

Edit: This change will cause multiple tests to fail, because several tests include ambiguous DFAs. I think the tests should be changed, but want comments for this.

jakobnissen commented 4 years ago

Counterexample: This machine is ambiguous, but not caught by my PR: Edit: Fixed in the latest commit. See below.

machine = let
   a = re"[a-z]+"
   a.actions[:exit] = [:one]
   b = re"[a-z][a-z0-9]+"
   b.actions[:exit] = [:two]
   Automa.compile(a | b)
end

It might be possible to catch these by changing the accumulate_actions function, but I can't spend time on that right now. I'll come back to it some other day

jakobnissen commented 4 years ago

Alright, all edge cases I could think of has now been covered.

The ambiguity check can be skipped by passing unambiguous=false to either Automa.compile (where it's a keyword), or Automa.nfa2dfa (where it's a positional argument). However, hopefully, future developers will take a hard, long look at their parser if Automa yells at them that its actions can't be resolved, rather than just disabling the check.

Conflicting preconditions should resolve ambiguities, but I've not been able to actually construct conflicting preconditions, so I haven't been able to test it in Test19.jl.

This is ready to merge.

kescobo commented 4 years ago

This is awesome, thanks for putting in the work! Sad to lose 100% coverage - maybe @bicycle1885 has ideas for how to test?

jakobnissen commented 4 years ago

I can't access the codecov report @kescobo. If I get access, I'll fix it myself.

jakobnissen commented 4 years ago

Turns out it's possible to create a loop of epsilon edges:

import Automa
const re = Automa.RegExp
Automa.re2nfa(re.rep(re.rep(re.any()) * re.opt(re.parse("A"))))

This leads to an infinite loop, so that needs to be fixed.

jakobnissen commented 3 years ago

Infinite loop fixed, but that exposed a bug (or well, suboptimal behaviour) in the nfa2dfa function: #56