caleb531 / automata

A Python library for simulating finite automata, pushdown automata, and Turing machines
https://caleb531.github.io/automata/
MIT License
338 stars 63 forks source link

to_regex method returns different results for the same input #231

Open HilaZiv opened 2 months ago

HilaZiv commented 2 months ago

Hello, I'm using version 8.3.0 and would like to compare different DFAs based on their regular expressions. When I convert a DFA to GNFA and use to_regex() method, I sometimes get different results for the same DFA when executing the code more then once. Is there a way to make this process deterministic, so I would get the same regular expression every time? Thanks in advance, Hila.

I'm using the following code when I sometimes receive (012(012|12)*(03|3)|03)*(012(012|12)*(01?|1?)|(01?)?) and sometimes (0(12(12)*(30|0)|30)*(12(12)*(3|1?)|(3|1?)))?

gt_symbols = {'Connect': '0', 'Send_msg': '1', 'Ack': '2', 'Close': '3'}
gt_dfa = DFA.from_nfa(NFA(
    states={'q0', 'q1', 'q2'},
    input_symbols={gt_symbols['Connect'], gt_symbols['Send_msg'], gt_symbols['Close'], gt_symbols['Ack']},
    transitions={
        'q0': {gt_symbols['Connect']: {'q1'}},
        'q1': {gt_symbols['Close']: {'q0'}, gt_symbols['Send_msg']: {'q2'}},
        'q2': {gt_symbols['Ack']: {'q1', 'q0'}}
    },
    initial_state='q0',
    final_states={'q0', 'q1', 'q2'}
))
regex = GNFA.from_dfa(gt_dfa).to_regex()
print(regex)
caleb531 commented 2 months ago

Interesting. @eliotwrobson I wonder if this has to do with some assumption in the GNFA logic about the order of set elements, when the order of set elements in Python can change between interpreter runs. Thoughts?

eliotwrobson commented 2 months ago

@caleb531 yes, I think this has something to do with the fact that the algorithm removes items from a set in a non-deterministic order in case of ties: https://github.com/caleb531/automata/blob/main/automata/fa/gnfa.py#L343

I'd have to play around with the code a little bit to get a better feel for what's going on. However, I don't think this is technically a bug, when I checked the different regexs output by the algorithm, they were equivalent.