caleb531 / automata

A Python library for simulating finite automata, pushdown automata, and Turing machines
https://caleb531.github.io/automata/
MIT License
341 stars 65 forks source link

Feature Request: NFA <-> Regular Expression #13

Closed tcosmo closed 2 years ago

tcosmo commented 5 years ago

Hello,

Thank you for your work and very nice library.

If ever you get the time, would it be possible to implement the conversion NFA <-> Regular expression ?

The direction Regex -> NFA is easy, the other one more annoying. A good reference is "Introduction to the Theory of Computation" by Sisper, chapter 1, definition 1.64 for the concept of "generalised nondeterministic finite automaton" which is useful in the conversion NFA -> Regex.

Thank you very much again, Tristan

caleb531 commented 5 years ago

@tcosmo This is a great idea. @YtvwlD, do you think this is something you'd be up to tackle?

YtvwlD commented 5 years ago

I had the idea last year, but then decided that it probably would be too much work. :) I don't have any time this week but might look at this next week.

jonthemango commented 4 years ago

@tcosmo You mentioned that going from RE to NFA is easy. Can you describe how? I know one can do this using Thompson's construction but going from Python's RE module to an actual NFA is proving difficult.

tcosmo commented 4 years ago

Hi @jonthemango , the python RE module is, as far as theory is concerned, a bit a pain in the a**. Indeed it is useful for very practical cases but it doesnt care that much about the theory and doest dwell well with it.

I recommand using this package instead: https://github.com/cyphar/redone/tree/master/redone

T

caleb531 commented 4 years ago

Does anyone here still have an interest in this feature? My only concern is that even if it were feasible to implement, there would be countless edge cases that could cause the algorithm to fail. In addition, I suspect any final implementation might be too arbitrary or rigid.

I've already skimmed through this tutorial, and it seems the process is pretty involved. It's probably not something I would dive into head-on unless I had a bit more experience with these kinds of conversions.

tcosmo commented 4 years ago

I definitely still have an interest!

Very few automata library are bothered with this feature which is so important from a theoretical perspective.

I would be able to implement this feature.

T

On Sun 8 Mar 2020 at 01:30, Caleb Evans notifications@github.com wrote:

Does anyone here still have an interest in this feature? My only concern is that even if it were feasible to implement, there would be countless edge cases that could cause the algorithm to fail. In addition, I suspect any final implementation might be too arbitrary or rigid.

I've already skimmed through this tutorial https://www.youtube.com/watch?v=OKFrju0HB7k, and it seems the process is pretty involved. It's probably not something I would dive into head-on unless I had a bit more experience with these kinds of conversions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caleb531/automata/issues/13?email_source=notifications&email_token=AB2DNAGGISAXR67DVRW5ZVDRGLYMRA5CNFSM4IGFU6Y2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOEJ2VQ#issuecomment-596155734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2DNAHXZK5DCZOATEW3KD3RGLYMRANCNFSM4IGFU6YQ .

caleb531 commented 4 years ago

@tcosmo Excellent! Well, if you mean you would be comfortable writing the Python code to implement this, then you are certainly welcome to. Just submit me a pull request when you are finished, with the necessary test cases as well to verify that the functionality works as expected in all scenarios.

Now that I think of it, I really need to write a CONTRIBUTING.md guide for this project. 😅

caleb531 commented 4 years ago

@tcosmo Oh, and as for the API design, I would prefer the method be called to_regex, under the NFA class. You are welcome to construct the functionality using multiple functions, as long as the name of each private function begins with an underscore (e.g. _compute_internal_stuff).

euanong commented 3 years ago

I ended up implementing regex -> NFA conversion as part of a project I'm working on -- is this feature request still open? @caleb531

caleb531 commented 3 years ago

@eohomegrownapps Yes, it's still open! Please fork the repository and submit me a Pull Request if you'd like to take a stab at working in your logic.

Please see my API notes below, as well as read through the CONTRIBUTING.md to make sure tests, etc. are included with your PR:

Oh, and as for the API design, I would prefer the method be called to_regex, under the NFA class. You are welcome to construct the functionality using multiple functions, as long as the name of each private function begins with an underscore (e.g. _compute_internal_stuff).

abhinavsinha-adrino commented 2 years ago

So I have implemented NFA/DFA to regular expression conversion indirectly by first converting NFA/DFA to generalized NFA, as @tcosmo mentioned. It so happened that I read the same book this summer as a project. However, the expressions generated every time are not unique but, yes, are equivalent to each other (I tested several samples here This is posing a problem in testing it, we need yet another module which verifies if two regular expressions are equivalent.

caleb531 commented 2 years ago

@abhinavsinha-adrino What do you mean when you say the regular expressions not unique every time? Does the conversion process not produce the same regular expression given the same DFA/NFA?

abhinavsinha-adrino commented 2 years ago

@caleb531 Not unique means the strings are not equal, but logically expressions are equivalent. The conversion algorithm consists of recursively removing states from the GNFA one by one, the choice of that state in each step results in different strings for the regular expression. At each step, I chose the state which is least connected in the machine, but again multiple such states may exist with equal connectivity.

caleb531 commented 2 years ago

@abhinavsinha-adrino I see. Well, if the algorithm for choosing among states with equal connectivity is a deterministic algorithm (i.e. always returns the same output given the same input), then I don't see why the resulting regex strings would be different.

Are you choosing from those equivalent states at random?

abhinavsinha-adrino commented 2 years ago

@caleb531 The thing is we are storing those states in sets, so choosing among them is random if we use min function, I can make it deterministic according to their name yes. But again for testing, we don't know what form of regular expression it is gonna output. Manually, by hand, we may be writing a simpler regular expression that may not match the output of code. One way would be to follow the same algorithm by hand to produce answers and then test the machine.

One example of the different strings for same regex produced by my code- (b(ba*b)*a|a)*b(ba*b)* and a*b(aa*b|ba*b)* image

caleb531 commented 2 years ago

@abhinavsinha-adrino Oh right, I forgot that this entire library is set-based. 🤦🏻‍♂️

I finally understand what you're saying—thank you for the explanation. That is,, our version of the expected output may not be string-equal to the actual output, even if they are regex-equivalent.

I think the solution to that (as I think you are implying) would be to compare the machine's output with our handwritten output through an online equivalence-checker. If they are equivalent, then use the machine's actual output as our expected output (because we just verified that they're equivalent, at least for now).

One downside to that is that it makes our tests more brittle, as any change in the implementation may require us to "pre-compute" our expected outputs all over again. But I think that is a trade-off I'm willing to make for the time being, at least until we incorporate a regex-equivalence engine into the project.

abhinavsinha-adrino commented 2 years ago

👍I will write test codes in coming days. I'll try to work on regex equivalence part too and see if it works out.

caleb531 commented 2 years ago

Just an update for everyone: I have merged @abhinavsinha-adrino's generous contributions in #46, and will be publishing v6 of this library relatively soon. However, if you don't want to wait, you can pull the latest codebase off of my new develop branch: https://github.com/caleb531/automata/tree/develop

caleb531 commented 2 years ago

Great news, everyone! Automata v6 is finally out with an API for working with regular expressions. Please see the README for details:

https://pypi.org/project/automata-lib/

With that, I will be closing this issue now.