caleb531 / automata

A Python library for simulating finite automata, pushdown automata, and Turing machines
https://caleb531.github.io/automata/
MIT License
339 stars 64 forks source link

problems with GNFA.to_regex #122

Closed colette-b closed 1 year ago

colette-b commented 1 year ago

GNFA.to_regex seems to give wrong results sometimes. For example, the following code:

from automata.fa.dfa import DFA
from automata.fa.nfa import NFA
from automata.fa.gnfa import GNFA

nfa = NFA.from_regex('a*')
dfa = DFA.from_nfa(nfa)
gnfa = GNFA.from_dfa(dfa)
print(gnfa.to_regex())

prints a single-character string *, which is not a valid regex.

caleb531 commented 1 year ago

@colette-b Thank you for the report! I can reproduce the issue on my end, as well.

automata py3 main ❯ python3
Python 3.11.0 (main, Oct 26 2022, 19:06:18) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from automata.fa.dfa import DFA
>>> from automata.fa.nfa import NFA
>>> from automata.fa.gnfa import GNFA
>>> 
>>> nfa = NFA.from_regex('a*')
>>> dfa = DFA.from_nfa(nfa)
>>> gnfa = GNFA.from_dfa(dfa)
>>> print(gnfa.to_regex())
*

@abhinavsinha-adrino I believe the GNFA class was your addition to the library. Do you have any thoughts on this?

cc @eliotwrobson (in case you have anything to add)

eliotwrobson commented 1 year ago

@caleb531 I think it was @abhinavsinha-adrino that added GNFA (in PR #46). I'm not super familiar with the way that the GNFA works; though I did do some light refactoring of the to_regex() function to change it to use iteration (in https://github.com/caleb531/automata/pull/74), I don't think that would have introduced this bug. Might be worth double checking with v6 of the library.

I played with the example code and it looks like the dfa produced correctly recognizes the a* regex, so the issue is definitely in the GNFA class somewhere.

caleb531 commented 1 year ago

@eliotwrobson Oh yes, my apologies. Unfortunately, it does appear that this bug was introduced in the v7 release, because the bug is not present in v6:

automata py3 main ❯ git checkout v6.0.0
HEAD is now at bb6d5f9 Prepare v6.0.0 release
automata py3 HEAD ❯ python3
Python 3.11.0 (main, Oct 26 2022, 19:06:18) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from automata.fa.dfa import DFA
>>> from automata.fa.nfa import NFA
>>> from automata.fa.gnfa import GNFA
>>> 
>>> nfa = NFA.from_regex('a*')
>>> dfa = DFA.from_nfa(nfa)
>>> gnfa = GNFA.from_dfa(dfa)
>>> print(gnfa.to_regex())
(aa*)?

Here is a link to the diff between v6 and v7, for reference: https://github.com/caleb531/automata/compare/v6.0.0...v7.0.0#diff-f507a0a5351199af63295732c015178c1cd835aca088126d724a1f0f68a6b566

eliotwrobson commented 1 year ago

Ah then that is my bad then 😅 I'll add this to the test suite and see if I can whip up a fix really quickly.

EDIT: it seems like this issue might not come up if you try converting directly from the NFA to the GNFA (no intermediate DFA)? Will play with it more to be sure but kinda weird.

abhinavsinha-adrino commented 1 year ago

@eliotwrobson I think I found one of the problems (hopefully the only one 🥲, I haven't run anything yet) Line 230 in gnfa.py (v7) r2 = f'{r1}*' in v6 this was: r2 = '{}*'.format(r2)

This must have been generating errors with many other regular expressions.

eliotwrobson commented 1 year ago

@abhinavsinha-adrino ahhh yeah that would do it, thank you! Let me change that and see if it works. One of the other issues is in the code below:

https://github.com/caleb531/automata/blob/f7acee845503f927456f6ae7bd9dd378704a0949/tests/test_gnfa.py#L251-L259

This test will never raise an error, which is probably why we didn't catch this bug earlier.

abhinavsinha-adrino commented 1 year ago

@eliotwrobson I did a lousy job in testing the code. This was my first open-source contribution, and I'm not a programmer, so I kind of hated the testing part 😖.

Maybe some more tests need to be added. I guess we can simply add 'a*' test case. Is it possible for you to add this with the fix?

eliotwrobson commented 1 year ago

Maybe some more tests need to be added. I guess we can simply add 'a*' test case. Is it possible for you to add this with the fix?

@abhinavsinha-adrino take a look at the PR I just put up (#123), it adds this along with the ability to easily add arbitrary regex to the test suite.

abhinavsinha-adrino commented 1 year ago

@eliotwrobson 👍, this fixes the issue.

caleb531 commented 1 year ago

Excellent! I have merged that PR, and am presently pushing a Automata v7.1.0 release containing this fix (as well as the other recent additions, like the ^ regex operator).

caleb531 commented 1 year ago

@colette-b Alright, v7.1.0 is released with this fix. Thank you for reporting this bug!

https://github.com/caleb531/automata/releases/tag/v7.1.0

I will be closing this issue now, but please let me know if you run into any issues.