Aunsiels / pyformlang

A python library to manipulate formal languages and various automata
https://pypi.org/project/pyformlang/
MIT License
45 stars 10 forks source link

General architecture improvement #26

Open bygu4 opened 3 weeks ago

bygu4 commented 3 weeks ago

Continuing the finite_automaton redesign I think we should go on and rework the relations of the rest of the modules as well, making the architecture clearer and removing the cyclic imports. For now, I think this is the general design we could try to implement:

general_module_design

In more detail, firstly I think we should build CFG from PDA in the CFG class instead: https://github.com/Aunsiels/pyformlang/blob/9d576c8f1edd63914e63c34b2d167173b2f566ad/pyformlang/pda/pda.py#L330 Then, I think we should define the PDA intersection with the DFA explicitly, to avoid unnecessary checks, and so that Regex -> ENFA -> DFA transition chain can be used: https://github.com/Aunsiels/pyformlang/blob/9d576c8f1edd63914e63c34b2d167173b2f566ad/pyformlang/pda/pda.py#L443 And also we can do the same thing in the CFG class: https://github.com/Aunsiels/pyformlang/blob/9d576c8f1edd63914e63c34b2d167173b2f566ad/pyformlang/cfg/cfg.py#L791

On a side note, I think the CFG Variable Converter should take place in the cfg module. To rework it in a stricter way, I think we can move the object classes (i. g. finite_automaton objects, regex objects, etc.) to a separate module, and that way I think we could also avoid some potential design issues.

Moving on to indexed_grammar, I think we should implement the intersection explicitly with FST (in a way similar to CFG): https://github.com/Aunsiels/pyformlang/blob/9d576c8f1edd63914e63c34b2d167173b2f566ad/pyformlang/indexed_grammar/indexed_grammar.py#L335 In that case the Regex -> ENFA -> FST transition chain could be used. Then we can remove the intersection method from the FST class. I think it shouldn't be there if it returns IndexedGrammar. Also I would suggest using CFG objects in indexed_grammar, and FA objects in fst to add stricter type annotations.

If you agree with these changes, I would like to take this issue and try to implement them.

bygu4 commented 1 week ago

Alternatively, we can add from_cfg method to PDA. It might be better because:

But on other hand, it would bring inconsistency to the architecture. I think, it would have made more sense if we had regular grammar representation that finite_automaton module would depend on. Maybe that's an idea for future updates.