caleb531 / automata

A Python library for simulating finite automata, pushdown automata, and Turing machines
https://caleb531.github.io/automata/
MIT License
349 stars 64 forks source link

Added jupyter notebook integration and new visualization #129

Closed khoda81 closed 1 year ago

khoda81 commented 1 year ago

Changes in this PR

Examples of visual changes

Old New
old dfa new dfa
N/A (Resulted in an error, the generated dot was not valid) dfa-minify

More Example images

images.zip The ones not included in the old folder resulted in error when generating.

Note: Tests for the pydot graph need to be updated. I am waiting for these new visualizations to be approved and finalized first.

eliotwrobson commented 1 year ago

@khoda81 This is really cool! I haven't looked at the code super closely, but would it be possible to use this Jupyter integration as part of an optional dependency, as in the discussion here: https://github.com/caleb531/automata/issues/124

I think that it would be really nice to have the changes here accompany a move towards the architecture described in the issue above.

khoda81 commented 1 year ago

@eliotwrobson Looking at Visual Automata seems like I reinvented the wheel. I wish it were mentioned in the project readme.

I agree that DFA/NFA/GNFA classes need some refactoring action; the way I implemented this visualization is by adding IPython integration in the FA class and implementing a few methods (like iter_states, iter_transitions, is_accepted, ...) in the DFA/NFA/GNFA classes to refactor the visualization logic. The visualization code does not care about what is being visualized it just calls the iter_* methods. This also makes it really easy to add visualization for PDAs and others. This is (for now) lacking from the Visual Automata. Refactoring this logic would cut the size of the code almost in half.

We can add Visual Automata as an optional dependency, then surround the import with a try-except block; just like it is done in many libraries like flask. The rest is just replacing my visualizations in the FA class with Visual Automata ones and printing a warning if the import failed and falling back to the default repr of the class. This way the only code added to this library would be only a few lines of code in the FA class, with a few utility methods (iter_states, iter_transitions, is_accepted, ...) in DFA/NFA/GNFA.

This would also remove the need for the VisualAutomata/Visualizer class for visualizations because the IPython integration methods are only called by Jupyter. So there is no need to wrap the DFA and it will use the visualization automatically when used in Jupyter and act just like it did before everywhere else and when Visual Automata import failed.

If my rough explanations made sense and this is what you expected, I will implement these changes.

eliotwrobson commented 1 year ago

@khoda81 I think the way you refactored this is definitely more towards the direction that I think we want to go in with respect to the visualizations described in that issue. Instead of adding Visual Automata as a dependency, this would be a great opportunity to start to integrate that code with your refactors in here (since the purpose of that original issue was to move Visual Automata into this library). As you stated, with your refactors, the size of the code will decrease significantly.

@caleb531 do you have any input?

caleb531 commented 1 year ago

@eliotwrobson @khoda81 I am hesitant about adding Visual Automata as a dependency because it would duplicate all of the non-visualization Automata code. So I am more in favor of integrating the code into this library.

While I do like my Visualizer class proposal from #124, I also want to consider what is most practical for people using this library. If this PR provides the right IPython / Jupyter integrations (i.e. something more practical than the Visualizer class idea), then I'd be fine with that direction. Especially considering that I am not very familiar with IPython or Jupyter.

I suppose if we nix the Visualizer class, I will again raise my concern about the DFA/NFA classes becoming too bloated, but I also want to recognize that practicality beats purity.

Also cc'ing @lewiuberg on this thread so that he's in the loop about this proposed new direction, especially since he was already supposedly starting on the Visualizer implementation for #124. @lewiuberg, if you have any thoughts on this, please feel free to chime in.

khoda81 commented 1 year ago

@eliotwrobson @caleb531 Alright then, not adding Visual Automata as a dependency makes this implementation a whole lot easier, it's just a matter of stealing the visualizations from Visual Automata and replacing mine with them. This is the implemented behavior in Jupyter:

from automata.fa.dfa import DFA

dfa = DFA(
    states={'q0', 'q1', 'q2'},
    input_symbols={'0', '1'},
    transitions={
        'q0': {'0': 'q0', '1': 'q1'},
        'q1': {'0': 'q0', '1': 'q2'},
        'q2': {'0': 'q2', '1': 'q1'},
    },
    initial_state='q0',
    final_states={'q1'}
)

dfa

dfa

dfa.minify(retain_names=True)

dfa-minify

It would act just like it did before everywhere else and prints the result of dfa.__repr__(). There would be no need for any Visualizer class. These integrations happen by implementing _repr_mimebundle_ (implemented in FA) which is only called by Jupyter to check if there is any visualization available, in case of an error it would fallback to __repr__. This would be way more practical. Then the purpose of the show_diagram method would be to give the user options to customize visualizations and save them if needed.

lewiuberg commented 1 year ago

Hi guys! First, sorry for being a bit on/off. I have had a crazy period at work. I have been working on this from time to time, but since automata isn't really something that affects me in my work the direction we decide on isn't that important to me. If you guys want to "steal" the code or add it as a dependency it is fine by me. I just feel it is very beneficial for learning purposes and I don't want all the time I invested into it to go to waste. Although I think it would be the most effective to implement it to automata-lib in terms of maintenance.

One cool thing I want to share is that I just spoke to a professor at Colombia University, and they are now starting to use Visual-Automata in their classes 😃

eliotwrobson commented 1 year ago

One cool thing I want to share is that I just spoke to a professor at Colombia University, and they are now starting to use Visual-Automata in their classes 😃

That's outstanding! That's actually part of why I really like this PR, the integration with something like Jupyter makes it much easier to create self-contained course content using this library.

caleb531 commented 1 year ago

@khoda81 This is a great start so far! Thank you so much for what you are contributing.

I do have a few points of constructive feedback:

  1. I've tried to chime in on all the comments that @eliotwrobson left for you. I agree with most of his feedback, and would ask that you implement those suggestions of his on which I have indicated approval.
  2. The automated tests are currently failing. Please make sure to run nose2 while you are developing and fix any broken tests (or, as needed, add new tests to maintain code coverage)
  3. I think my recent merge of #130 has caused a conflict in this branch, so that will need to be resolved as well before I am able to merge this PR
caleb531 commented 1 year ago

@khoda81 Just a heads up—for more consistent code style across the entire codebase, I plan to integrate the black auto-formatter and apply it to all Python files in the library.

Once I do this on the develop branch, I will update this PR to auto-format the files you've already modified. This should prevent future formatting conflicts, and also eliminate some feedback loops in this PR for pedantic topics like quote style. 😅

cc @eliotwrobson

caleb531 commented 1 year ago

@khoda81 Okay, I've added black to the develop branch, and have rebased your feature branch (jupyter-integration) off of my official develop branch. Please make sure to pull in these new (rebased) changes properly.

I've done my best to resolve conflicts and preserve the integrity of your changes, but please double-check just to be sure.

cc @eliotwrobson

khoda81 commented 1 year ago

@caleb531 black is a great addition! It would be cool to add it as a pre-commit hook. I might do that after this PR. It would help greatly with contributions. I usually use Gymnasium's pre-commit hooks as a template for my own projects.

caleb531 commented 1 year ago

It would be cool to add it as a pre-commit hook. I might do that after this PR. It would help greatly with contributions. I usually use Gymnasium's pre-commit hooks as a template for my own projects.

@khoda81 So I have limited knowledge of pre-commit hooks, but from what I understand, they can't be committed to the repository directly and require some manual setup to activate. So it seems like the pre-commit hook is only good for those contributors who take the time to set it up. I guess the same could be said of editor plugins.

In your experience, is there a good way to streamline the management/adoption of pre-commit hooks for a project?

cc @eliotwrobson (in case you have any thoughts on this)

khoda81 commented 1 year ago

@caleb531 To add them to this project we need to only add the .pre-commit-config.yaml file to the root of the project, which contains the hooks (like black, trailing-whitespace, isort, etc) and configurations related to them. What changes for contributors is explained in the contribution file here (which this project also needs, btw). The setup is pretty minimal, pre-commit can be pip installed once and most hooks would also be automatically set up once. Visit pre-commit.com for more info.

caleb531 commented 1 year ago

@khoda81 This project does have a CONTRIBUTING file, but it's under the .github subdirectory:

https://github.com/caleb531/automata/blob/develop/.github/CONTRIBUTING.md

Anyway, you've persuaded me that a pre-commit hook would be a valuable addition to this project. I would greatly appreciate a separate PR for this if you have the time to contribute one.

khoda81 commented 1 year ago

@caleb531 my bad, I somehow missed the file. I will add pre-commit hooks in a separate PR.

khoda81 commented 1 year ago

@eliotwrobson @caleb531 I have been playing around with different implementations of the step visualization feature and I have two ways of achieving it:

  1. Adding an alternate method to DFA/NFA that returns the steps with a data structure that is easier to work with for visualization
  2. Adding a switch case for DFA/NFA in the show_diragram method to translate the result of read_input_stepwise based on the class that we are working with.

The first solution is more on brand with the rest of this PR but would add another method for the purpose of visualization which I have been trying to avoid. The upside would be that if implemented properly it can replace read_input_stepwise and gives a more generalized API for FAs and makes it easier to add visualizations for PDAs or any custom automata as well. The second solution though would be probably less code and using the existing code. We can also use the second option as a temporary solution and play around with different generalizations of FAs in a separate PR. Any thoughts on this?

eliotwrobson commented 1 year ago

@khoda81 For option 1, do you mean that the method would return a data structure holding information about each step that is taken, and then the visualization code could just read this directly? Does this differ greatly from just yielding all results from the generator given by read_input_stepwise? I don't have an immediate strong opinion either way.

caleb531 commented 1 year ago

@khoda81 I think any major re-architecture of the input-reading APIs will require some iteration and proof-of-concepts, for sure. On that note, I like the idea of starting with the second option since it seems like it would have less of an API impact, and would be a good starting point to explore larger changes.

Out of curiosity, what kinds of detail is missing from the current input-reading API that a new API would be more useful in providing?

khoda81 commented 1 year ago

@caleb531 @eliotwrobson My proposal for the new API is here #138. I need the read_input_stepwise to tell me which edges were used in every single transition. I will handle NFA/DFA input read separately using a good old if-else for now...

BTW, I have a hard time figuring out what the purpose of GNFA class is. Does not seem to be used anywhere internally.

khoda81 commented 1 year ago

Step visualizations are implemented and seem to be working properly. I am torn between different ways to label transitions in a way not to make the diagram cluttered but concise. step_viz1 step_viz3 step_viz5 @eliotwrobson @caleb531 any opinions/proposals?

eliotwrobson commented 1 year ago

@khoda81 I think it's always going to be a bit tricky visualizing D/NFAs reading input much longer than their number of states (inherently things are going to get cluttered, it would be better to have an animation rather than a static image but that's way out of scope I think). I like the way you have the color grading become more green as more of the input is being read. Maybe try italicizing the [#1] labels so they stand out from just the input symbols? Overall things look good though.

Could you try an example with a somewhat larger DFA constructed using the from_finite_language method? I think it might be a common use case for people.

I haven't looked at the code, but it seems like a few of the backend changes might overlap a bit with https://github.com/caleb531/automata/pull/139, and this branch is already bit out of date with develop. Once that PR is merged, the next step is to rebase the changes here and see if any logic can be condensed (cc @caleb531 to see if we can get that PR merged soon, since it's already been approved).

khoda81 commented 1 year ago

@eliotwrobson

I like the way you have the color grading become more green as more of the input is being read

Credit for that Idea goes to lewiuberg. It also goes from yellow to red if the input is not accepted.
‌Bold language_accepted_b
Italic language_accepted_i
Italic & Bold language_accepted_b_inner_i_new

And here is a rejection example

language_failed_b_inner_i_new

Here is my playground notebook + diagrams

Playground.zip

eliotwrobson commented 1 year ago

@khoda81 #139 was recently merged, you should be good to rebase!

khoda81 commented 1 year ago

@eliotwrobson With NFA input visualization implemented the only remaining thing to add is the table feature, which includes:

  1. fa.table: returns a table of all transitions
  2. fa.input_check(input): returns the history of states when reading input

You can see examples of this on VisualAutomata's readme. The only problem is that it uses pandas which means to add this feature we have to add pandas as a dependency. So is it worth it?

eliotwrobson commented 1 year ago

@khoda81 Looking into it, we already have networkx as a dependency, which has pandas as a dependency (I don't quite understand the reason for this but it is what it is), so adding pandas as a dependency doesn't really change anything. However, I'm not really sure about the utility of the table feature, since there are a number of other ways to read the transitions of a DFA/NFA, and people can load them into a table on their own if they want (or other frontend display). The code is short enough that it might be better suited for an example we provide with the library (see https://github.com/caleb531/automata/issues/103).

Whatever direction we go with, I think it should be done in a separate PR as this one is already getting quite large. @caleb531 any thoughts on this?

Also, I'm not sure why but the diff seems to include all of the existing changes that are already on the develop branch? This makes the code very hard to review.

caleb531 commented 1 year ago

@khoda81 @eliotwrobson From the sound of it, and after reviewing the VisualAutomata docs, I am not a huge fan of the table features (both fa.table and fa.input_check), since we are hardcoding the implementation to a specific style/format, and unless we provide a lot of options to control the display, the end user is probably going to write their own implementation anyway.

But if you want to discuss it further, I would prefer a separate PR anyway.

khoda81 commented 1 year ago

@eliotwrobson The diff problem should be fixed now, I forgot to sync my fork before rebasing. I am starting to fix the tests and add visualization documentation.

eliotwrobson commented 1 year ago

@khoda81 I just pushed some minor changes to the develop branch, so there are a couple of small merge conflicts that have to be resolved. It should be pretty straightforward to address those along with the remaining comments on here (I don't think there are too many).

Feel free to let me know if you are busy and would like some help. It would be nice to have this merged soon, since I think we're getting close to moving through enough items to get to a v8 release.

khoda81 commented 1 year ago

@eliotwrobson Sorry for the delay, I am taking a break for now, I'm going through university midterms rn being an undergrad. :) I pushed some minor changes that were remaining to this branch for now but absolutely go ahead if you can. It might take another week or two for me to come back. All planned features are implemented, this pr just needs some finishing touches...

eliotwrobson commented 1 year ago

@khoda81 no worries! I'm a bit busy this week too wrapping up my semester. I'll try to get some changes on this branch after that, since I'm shooting to get most of the work done for the next release before I go on vacation.

eliotwrobson commented 1 year ago

@khoda81 I made some pushes, so be sure to pull. The main changes were addressing the items listed on the discussions and cleaning up some types. I think the main changes remaining are to polish up the handling of optional dependencies, update the test cases to increase the code coverage, and simplifying the NFA algorithms.

I think the best way to handle the optional dependencies is to bundle them together and use the method here: https://stackoverflow.com/a/27361558

EDIT: @khoda81 After playing around some, would it be possible to switch from the graphviz library to the pygraphviz library to generate the images? I think it would make it easier to adapt the existing test cases. I understand though if this is difficult to do.

EDIT 2: I was able to switch from graphviz to pygraphviz with pretty minimal changes. Parts of the API need to be reworked, and the Jupyter integration with _repr_mimebundle_ needs to be polished up (in particular, the attachment point of the arrows is kinda weird), but this is going to make it much easier to test the diagram creation code, and means our existing tests don't have to be completely thrown out.

For an example to follow of Jupyter integration, we can follow what they do in grapviz: https://github.com/xflr6/graphviz/blob/master/graphviz/jupyter_integration.py

Although it's going to take some effort to integrate this slightly different architecture, I think it's more flexible, robust, and easier to test than what was there before.

EDIT 3: Although this technically gets rid of the functionality that allows for the simplest display of reading input in Jupyter notebooks, I'm not sure if it's worth reworking the code to add that functionality back in. It only requires one extra line to get that to work, and not having that removes the dependence on IPython entirely (which may be desirable). Curious what others think (cc @caleb531). We should probably have some discussion here before proceeding, though the core intrusive changes to the other parts of the library are probably nailed down. @khoda81 maybe we can use something like the old jupyter integration you had, where there was an Ipythongraph class.

caleb531 commented 1 year ago

@eliotwrobson

It only requires one extra line to get that to work, and not having that removes the dependence on IPython entirely (which may be desirable). Curious what others think (cc @caleb531)

I like the idea of keeping it simple and avoiding wrangling in large dependencies when they could be easily added on a per-user basis.

I also agree about the switch to pygraphviz for the reasons you presented.

@eliotwrobson Is there anything more you plan to add to this PR? And @khoda81 was there anything more you want to review here or discuss based on Eliot's recent changes?

eliotwrobson commented 1 year ago

@caleb531

Is there anything more you plan to add to this PR? And @khoda81 was there anything more you want to review here or discuss based on Eliot's recent changes?

So in terms of features, I think that what's here is fine. The issue is that to remove the dependency and switch to pygraphviz over graphviz, I had to remove the feature where the diagram was displayed directly in the notebook on the show_diagram call. This is because the pygraphviz graph doesn't have Jupyter notebook integration similar to that of graphviz.

To get around this, I looked at a previous commit where @khoda81 created a Ipythongraph class, that was essentially a wrapper class facilitating a nicer Jupyter display. I think we could add this class back, and have it inherit from the AGraph class from pygraphviz. Then all that remains is to update the tests and clean things up. I thought about changing the recursive NFA path algorithm to an iterative one, but that can be done later.

I'm a bit busy right now and will be for at least the next week, so if someone wants to pick this up, the changes should be pretty straightforward.

EDIT: Ok so apparently the AGraph class already has these repr features for displaying as an SVG. Not sure why they weren't working for me when I tried it, but this should mean that the only code changes remaining will be updating tests.

khoda81 commented 1 year ago

@eliotwrobson I have pulled the changes and my Windows setup is broken (I have not installed pygraphviz yet). The typing at

https://github.com/caleb531/automata/blob/5c44c0e459dd27035728a6b46a7bafccfdbf7631/automata/fa/fa.py#L79 seems to throw an error when pygraphviz is not installed. Also, I'm having some issues installing pygraphviz on Windows. A simple pip install does not install pygraphviz but there is a conda channel to install it on Windows, which I prefer not to because it will make the installation process more difficult. Since it is required to be installed separately and adding to dependencies would not help.

EDIT: Official installation guide: https://pygraphviz.github.io/documentation/stable/install.html#windows

eliotwrobson commented 1 year ago

@khoda81 if you set from __future__ import annotations that should fix it. But you'll need pygraphviz to be able to work on the tests anyway.

coveralls commented 1 year ago

Coverage Status

coverage: 99.883% (-0.1%) from 100.0% when pulling 1bf1d7418bf86620b5f4652f6aad0ca0136c31f5 on khoda81:jupyter-integration into b0cde3f236a6b1001b60fed123599f13a3dba9e9 on caleb531:develop.

coveralls commented 1 year ago

Coverage Status

Coverage: 100.0%. Remained the same when pulling 954b57e150462c597897e52e655c4a33e3a788ec on khoda81:jupyter-integration into b0cde3f236a6b1001b60fed123599f13a3dba9e9 on caleb531:develop.

eliotwrobson commented 1 year ago

@caleb531 @khoda81 So I've updated the test cases and looks like everything is passing with coverage. I think this should be ready to start reviewing. A few notes:

caleb531 commented 1 year ago

@eliotwrobson @khoda81 Sorry for the delay—I've had a busy week so far. A few points:

  1. @eliotwrobson I left you a question about the optional dependencies—just want to make sure we have that cleared up before I merge anything
  2. The removal of GNFA. show_None is fine with me, and the rationale makes sense
  3. I've noticed Coveralls sometimes displays higher coverage than I was expecting, so once this PR is merged, I will run a coverage report myself (locally) to see how it compares

Now, there's a ton of code in this PR, so I don't have time to review all of it today. But on the surface (for what I've skimmed), it looks pretty good.

khoda81 commented 1 year ago

@caleb531 There is a slight problem with replacing graphviz with pygraphviz. I have a working installation of graphviz binary and graphviz python package. A simple pip install will not work on Windows because of build issues, therefore it needs to be installed with:

python -m pip install --global-option=build_ext `
              --global-option="-IC:\Program Files\Graphviz\include" `
              --global-option="-LC:\Program Files\Graphviz\lib" `
              pygraphviz

Which as a result becomes part of the installation process of automata lib (on Windows). This is while the graphviz binary needs to be installed separately anyway. And I need to add, automata lib cannot be installed on Windows if we add pygraphviz as a dependency since the installation of pygraphviz would fail. Therefore pip will terminate the entire installation.

Since graphviz did not provide proper API for testing purposes, one solution would be to switch back to pydot. But that might need to remain for another PR.

@eliotwrobson I have tested optional dependencies and it did work for me. Though as mentioned above I needed to install pygraphviz before I did pip install -e .[visual].

eliotwrobson commented 1 year ago

@khoda81 I don't think the issue you're getting at isn't as big of a deal in light of the introduction of optional dependencies. Python packages that interface with other applications are always going to be harder to install across different operating systems (this is the reason why, on my windows machine, I used conda to set up pygraphviz). As long as the package works without having to install the optional dependencies, I think we're in good shape.

@caleb531 No worries on the review! I think I might see if I can run coveralls locally and see what happens to try to boost coverage while you're taking a look. WRT optional dependencies, I think that everything works (besides the display function) if visual dependencies are not installed because of the import guard I set up. I wonder if there's a way to set up automated testing for that.

EDIT: Actually, I'm not sure how to run coveralls locally...

caleb531 commented 1 year ago

@eliotwrobson

I'm not sure how to run coveralls locally...

Oh hmm, I can tell you how to run coverage locally, but I've never tried to run Coveralls locally. It's strange though, because the CI job is just generating an lcov file, which is then being fed to Coveralls. So I wonder if Coveralls is actually doing anything beyond parsing the coverage data you feed it.

WRT optional dependencies, I think that everything works (besides the display function) if visual dependencies are not installed because of the import guard I set up. I wonder if there's a way to set up automated testing for that.

I think this could be accomplished with a second CI job that executes only visualization-related tests with the additional dependencies.

eliotwrobson commented 1 year ago

@caleb531 ahh I see, I think coveralls is parsing the data correctly but something about the coverage configuration might be off. I wasn't able to get nose2 coverage running locally.

eliotwrobson commented 1 year ago

@caleb531 @khoda81 I've pushed changes https://github.com/caleb531/automata/pull/129/commits/10706a6406d9057a727e2ab49371a3f0b354c009 that fix the correctness issue I was seeing (at least when testing with the small example NFA), along with tests verifying this for that particular instance. There have been a few uncovered lines that have been exposed, but the test cases are checking for most of the important issues related to correctness.

The preceding issue was the last major obstacle to this PR. All that remains is to add a few test cases that ensure correctness and cover the last remaining lines. I should be able to attack some of these tests, but would greatly appreciate some help on this, as there are a few pesky uncovered lines that require some weird setup code to hit. I also want to be sure about the optimality of the new accepting path finding algorithm.

EDIT: After making another pass, all that's left is to test that everything works even with the optional imports.

caleb531 commented 1 year ago

@eliotwrobson @khoda81 At this point, I'm tempted to just bypass the slight drop in coverage (to address in a separate PR) for the sake of merging this PR. It will unblock #147 and any other future PRs that touch these files (which would often be the case). I've also been realizing that achieving a perfect 100% code coverage can be impractical or even unattainable, sometimes.

@eliotwrobson I think my only requirement for wrapping up this PR at this point would be to work through the remaining comment threads to reach resolution for all of them.

eliotwrobson commented 1 year ago

@caleb531 sounds like a plan. I closed all of the conversations related to changes that I already made. There are only 4 remaining ones open, and they're all fairly tangential or something that we can resolve with testing later (minor stuff related to how the project is configured). I'll make another pass.

eliotwrobson commented 1 year ago

@caleb531 I've resolved all of the open comment threads on this PR. The only dangling items are:

Other than those, we have very high coverage and tests for most everything that's been added. Once you give the code here a quick lookover / are comfortable, I think we're good to go ahead and merge!