Closed khoda81 closed 1 year ago
@khoda81 This is really cool! I haven't looked at the code super closely, but would it be possible to use this Jupyter integration as part of an optional dependency, as in the discussion here: https://github.com/caleb531/automata/issues/124
I think that it would be really nice to have the changes here accompany a move towards the architecture described in the issue above.
@eliotwrobson Looking at Visual Automata seems like I reinvented the wheel. I wish it were mentioned in the project readme.
I agree that DFA
/NFA
/GNFA
classes need some refactoring action; the way I implemented this visualization is by adding IPython integration in the FA
class and implementing a few methods (like iter_states
, iter_transitions
, is_accepted
, ...) in the DFA
/NFA
/GNFA
classes to refactor the visualization logic. The visualization code does not care about what is being visualized it just calls the iter_* methods. This also makes it really easy to add visualization for PDAs and others. This is (for now) lacking from the Visual Automata. Refactoring this logic would cut the size of the code almost in half.
We can add Visual Automata as an optional dependency, then surround the import with a try-except block; just like it is done in many libraries like flask. The rest is just replacing my visualizations in the FA class with Visual Automata ones and printing a warning if the import failed and falling back to the default repr of the class. This way the only code added to this library would be only a few lines of code in the FA class, with a few utility methods (iter_states
, iter_transitions
, is_accepted
, ...) in DFA
/NFA
/GNFA
.
This would also remove the need for the VisualAutomata
/Visualizer
class for visualizations because the IPython integration methods are only called by Jupyter. So there is no need to wrap the DFA and it will use the visualization automatically when used in Jupyter and act just like it did before everywhere else and when Visual Automata import failed.
If my rough explanations made sense and this is what you expected, I will implement these changes.
@khoda81 I think the way you refactored this is definitely more towards the direction that I think we want to go in with respect to the visualizations described in that issue. Instead of adding Visual Automata as a dependency, this would be a great opportunity to start to integrate that code with your refactors in here (since the purpose of that original issue was to move Visual Automata into this library). As you stated, with your refactors, the size of the code will decrease significantly.
@caleb531 do you have any input?
@eliotwrobson @khoda81 I am hesitant about adding Visual Automata as a dependency because it would duplicate all of the non-visualization Automata code. So I am more in favor of integrating the code into this library.
While I do like my Visualizer
class proposal from #124, I also want to consider what is most practical for people using this library. If this PR provides the right IPython / Jupyter integrations (i.e. something more practical than the Visualizer class idea), then I'd be fine with that direction. Especially considering that I am not very familiar with IPython or Jupyter.
I suppose if we nix the Visualizer class, I will again raise my concern about the DFA/NFA classes becoming too bloated, but I also want to recognize that practicality beats purity.
Also cc'ing @lewiuberg on this thread so that he's in the loop about this proposed new direction, especially since he was already supposedly starting on the Visualizer implementation for #124. @lewiuberg, if you have any thoughts on this, please feel free to chime in.
@eliotwrobson @caleb531 Alright then, not adding Visual Automata as a dependency makes this implementation a whole lot easier, it's just a matter of stealing the visualizations from Visual Automata and replacing mine with them. This is the implemented behavior in Jupyter:
from automata.fa.dfa import DFA
dfa = DFA(
states={'q0', 'q1', 'q2'},
input_symbols={'0', '1'},
transitions={
'q0': {'0': 'q0', '1': 'q1'},
'q1': {'0': 'q0', '1': 'q2'},
'q2': {'0': 'q2', '1': 'q1'},
},
initial_state='q0',
final_states={'q1'}
)
dfa
dfa.minify(retain_names=True)
It would act just like it did before everywhere else and prints the result of dfa.__repr__()
. There would be no need for any Visualizer
class. These integrations happen by implementing _repr_mimebundle_
(implemented in FA
) which is only called by Jupyter to check if there is any visualization available, in case of an error it would fallback to __repr__
. This would be way more practical.
Then the purpose of the show_diagram
method would be to give the user options to customize visualizations and save them if needed.
Hi guys! First, sorry for being a bit on/off. I have had a crazy period at work. I have been working on this from time to time, but since automata isn't really something that affects me in my work the direction we decide on isn't that important to me. If you guys want to "steal" the code or add it as a dependency it is fine by me. I just feel it is very beneficial for learning purposes and I don't want all the time I invested into it to go to waste. Although I think it would be the most effective to implement it to automata-lib in terms of maintenance.
One cool thing I want to share is that I just spoke to a professor at Colombia University, and they are now starting to use Visual-Automata in their classes 😃
One cool thing I want to share is that I just spoke to a professor at Colombia University, and they are now starting to use Visual-Automata in their classes 😃
That's outstanding! That's actually part of why I really like this PR, the integration with something like Jupyter makes it much easier to create self-contained course content using this library.
@khoda81 This is a great start so far! Thank you so much for what you are contributing.
I do have a few points of constructive feedback:
nose2
while you are developing and fix any broken tests (or, as needed, add new tests to maintain code coverage)@khoda81 Just a heads up—for more consistent code style across the entire codebase, I plan to integrate the black auto-formatter and apply it to all Python files in the library.
Once I do this on the develop
branch, I will update this PR to auto-format the files you've already modified. This should prevent future formatting conflicts, and also eliminate some feedback loops in this PR for pedantic topics like quote style. 😅
cc @eliotwrobson
@khoda81 Okay, I've added black to the develop
branch, and have rebased your feature branch (jupyter-integration
) off of my official develop
branch. Please make sure to pull in these new (rebased) changes properly.
I've done my best to resolve conflicts and preserve the integrity of your changes, but please double-check just to be sure.
cc @eliotwrobson
@caleb531 black is a great addition! It would be cool to add it as a pre-commit hook. I might do that after this PR. It would help greatly with contributions. I usually use Gymnasium
's pre-commit hooks as a template for my own projects.
It would be cool to add it as a pre-commit hook. I might do that after this PR. It would help greatly with contributions. I usually use Gymnasium's pre-commit hooks as a template for my own projects.
@khoda81 So I have limited knowledge of pre-commit hooks, but from what I understand, they can't be committed to the repository directly and require some manual setup to activate. So it seems like the pre-commit hook is only good for those contributors who take the time to set it up. I guess the same could be said of editor plugins.
In your experience, is there a good way to streamline the management/adoption of pre-commit hooks for a project?
cc @eliotwrobson (in case you have any thoughts on this)
@caleb531 To add them to this project we need to only add the .pre-commit-config.yaml
file to the root of the project, which contains the hooks (like black, trailing-whitespace, isort, etc) and configurations related to them.
What changes for contributors is explained in the contribution file here (which this project also needs, btw). The setup is pretty minimal, pre-commit can be pip installed once and most hooks would also be automatically set up once. Visit pre-commit.com for more info.
@khoda81 This project does have a CONTRIBUTING file, but it's under the .github
subdirectory:
https://github.com/caleb531/automata/blob/develop/.github/CONTRIBUTING.md
Anyway, you've persuaded me that a pre-commit hook would be a valuable addition to this project. I would greatly appreciate a separate PR for this if you have the time to contribute one.
@caleb531 my bad, I somehow missed the file. I will add pre-commit hooks in a separate PR.
@eliotwrobson @caleb531 I have been playing around with different implementations of the step visualization feature and I have two ways of achieving it:
show_diragram
method to translate the result of read_input_stepwise
based on the class that we are working with.The first solution is more on brand with the rest of this PR but would add another method for the purpose of visualization which I have been trying to avoid. The upside would be that if implemented properly it can replace read_input_stepwise
and gives a more generalized API for FAs and makes it easier to add visualizations for PDAs or any custom automata as well. The second solution though would be probably less code and using the existing code.
We can also use the second option as a temporary solution and play around with different generalizations of FAs in a separate PR.
Any thoughts on this?
@khoda81 For option 1, do you mean that the method would return a data structure holding information about each step that is taken, and then the visualization code could just read this directly? Does this differ greatly from just yielding all results from the generator given by read_input_stepwise
? I don't have an immediate strong opinion either way.
@khoda81 I think any major re-architecture of the input-reading APIs will require some iteration and proof-of-concepts, for sure. On that note, I like the idea of starting with the second option since it seems like it would have less of an API impact, and would be a good starting point to explore larger changes.
Out of curiosity, what kinds of detail is missing from the current input-reading API that a new API would be more useful in providing?
@caleb531 @eliotwrobson My proposal for the new API is here #138.
I need the read_input_stepwise to tell me which edges were used in every single transition. I will handle NFA
/DFA
input read separately using a good old if-else for now...
BTW, I have a hard time figuring out what the purpose of GNFA
class is. Does not seem to be used anywhere internally.
Step visualizations are implemented and seem to be working properly. I am torn between different ways to label transitions in a way not to make the diagram cluttered but concise. @eliotwrobson @caleb531 any opinions/proposals?
@khoda81 I think it's always going to be a bit tricky visualizing D/NFAs reading input much longer than their number of states (inherently things are going to get cluttered, it would be better to have an animation rather than a static image but that's way out of scope I think). I like the way you have the color grading become more green as more of the input is being read. Maybe try italicizing the [#1]
labels so they stand out from just the input symbols? Overall things look good though.
Could you try an example with a somewhat larger DFA constructed using the from_finite_language
method? I think it might be a common use case for people.
I haven't looked at the code, but it seems like a few of the backend changes might overlap a bit with https://github.com/caleb531/automata/pull/139, and this branch is already bit out of date with develop
. Once that PR is merged, the next step is to rebase the changes here and see if any logic can be condensed (cc @caleb531 to see if we can get that PR merged soon, since it's already been approved).
@eliotwrobson
I like the way you have the color grading become more green as more of the input is being read
Credit for that Idea goes to lewiuberg. It also goes from yellow to red if the input is not accepted. | ||
---|---|---|
‌Bold | ||
Italic | ||
Italic & Bold |
@khoda81 #139 was recently merged, you should be good to rebase!
@eliotwrobson With NFA
input visualization implemented the only remaining thing to add is the table feature, which includes:
fa.table
: returns a table of all transitionsfa.input_check(input)
: returns the history of states when reading inputYou can see examples of this on VisualAutomata's readme. The only problem is that it uses pandas which means to add this feature we have to add pandas as a dependency. So is it worth it?
@khoda81 Looking into it, we already have networkx
as a dependency, which has pandas
as a dependency (I don't quite understand the reason for this but it is what it is), so adding pandas
as a dependency doesn't really change anything. However, I'm not really sure about the utility of the table feature, since there are a number of other ways to read the transitions of a DFA/NFA, and people can load them into a table on their own if they want (or other frontend display). The code is short enough that it might be better suited for an example we provide with the library (see https://github.com/caleb531/automata/issues/103).
Whatever direction we go with, I think it should be done in a separate PR as this one is already getting quite large. @caleb531 any thoughts on this?
Also, I'm not sure why but the diff seems to include all of the existing changes that are already on the develop branch? This makes the code very hard to review.
@khoda81 @eliotwrobson From the sound of it, and after reviewing the VisualAutomata docs, I am not a huge fan of the table features (both fa.table
and fa.input_check
), since we are hardcoding the implementation to a specific style/format, and unless we provide a lot of options to control the display, the end user is probably going to write their own implementation anyway.
But if you want to discuss it further, I would prefer a separate PR anyway.
@eliotwrobson The diff problem should be fixed now, I forgot to sync my fork before rebasing. I am starting to fix the tests and add visualization documentation.
@khoda81 I just pushed some minor changes to the develop branch, so there are a couple of small merge conflicts that have to be resolved. It should be pretty straightforward to address those along with the remaining comments on here (I don't think there are too many).
Feel free to let me know if you are busy and would like some help. It would be nice to have this merged soon, since I think we're getting close to moving through enough items to get to a v8 release.
@eliotwrobson Sorry for the delay, I am taking a break for now, I'm going through university midterms rn being an undergrad. :) I pushed some minor changes that were remaining to this branch for now but absolutely go ahead if you can. It might take another week or two for me to come back. All planned features are implemented, this pr just needs some finishing touches...
@khoda81 no worries! I'm a bit busy this week too wrapping up my semester. I'll try to get some changes on this branch after that, since I'm shooting to get most of the work done for the next release before I go on vacation.
@khoda81 I made some pushes, so be sure to pull. The main changes were addressing the items listed on the discussions and cleaning up some types. I think the main changes remaining are to polish up the handling of optional dependencies, update the test cases to increase the code coverage, and simplifying the NFA algorithms.
I think the best way to handle the optional dependencies is to bundle them together and use the method here: https://stackoverflow.com/a/27361558
EDIT: @khoda81 After playing around some, would it be possible to switch from the graphviz
library to the pygraphviz
library to generate the images? I think it would make it easier to adapt the existing test cases. I understand though if this is difficult to do.
EDIT 2: I was able to switch from graphviz
to pygraphviz
with pretty minimal changes. Parts of the API need to be reworked, and the Jupyter integration with _repr_mimebundle_
needs to be polished up (in particular, the attachment point of the arrows is kinda weird), but this is going to make it much easier to test the diagram creation code, and means our existing tests don't have to be completely thrown out.
For an example to follow of Jupyter integration, we can follow what they do in grapviz
: https://github.com/xflr6/graphviz/blob/master/graphviz/jupyter_integration.py
Although it's going to take some effort to integrate this slightly different architecture, I think it's more flexible, robust, and easier to test than what was there before.
EDIT 3: Although this technically gets rid of the functionality that allows for the simplest display of reading input in Jupyter notebooks, I'm not sure if it's worth reworking the code to add that functionality back in. It only requires one extra line to get that to work, and not having that removes the dependence on IPython
entirely (which may be desirable). Curious what others think (cc @caleb531). We should probably have some discussion here before proceeding, though the core intrusive changes to the other parts of the library are probably nailed down. @khoda81 maybe we can use something like the old jupyter integration you had, where there was an Ipythongraph
class.
@eliotwrobson
It only requires one extra line to get that to work, and not having that removes the dependence on IPython entirely (which may be desirable). Curious what others think (cc @caleb531)
I like the idea of keeping it simple and avoiding wrangling in large dependencies when they could be easily added on a per-user basis.
I also agree about the switch to pygraphviz
for the reasons you presented.
@eliotwrobson Is there anything more you plan to add to this PR? And @khoda81 was there anything more you want to review here or discuss based on Eliot's recent changes?
@caleb531
Is there anything more you plan to add to this PR? And @khoda81 was there anything more you want to review here or discuss based on Eliot's recent changes?
So in terms of features, I think that what's here is fine. The issue is that to remove the dependency and switch to pygraphviz
over graphviz
, I had to remove the feature where the diagram was displayed directly in the notebook on the show_diagram
call. This is because the pygraphviz
graph doesn't have Jupyter notebook integration similar to that of graphviz
.
To get around this, I looked at a previous commit where @khoda81 created a Ipythongraph
class, that was essentially a wrapper class facilitating a nicer Jupyter display. I think we could add this class back, and have it inherit from the AGraph
class from pygraphviz
. Then all that remains is to update the tests and clean things up. I thought about changing the recursive NFA path algorithm to an iterative one, but that can be done later.
I'm a bit busy right now and will be for at least the next week, so if someone wants to pick this up, the changes should be pretty straightforward.
EDIT: Ok so apparently the AGraph class already has these repr features for displaying as an SVG. Not sure why they weren't working for me when I tried it, but this should mean that the only code changes remaining will be updating tests.
@eliotwrobson I have pulled the changes and my Windows setup is broken (I have not installed pygraphviz yet). The typing at
https://github.com/caleb531/automata/blob/5c44c0e459dd27035728a6b46a7bafccfdbf7631/automata/fa/fa.py#L79 seems to throw an error when pygraphviz is not installed. Also, I'm having some issues installing pygraphviz on Windows. A simple pip install does not install pygraphviz but there is a conda channel to install it on Windows, which I prefer not to because it will make the installation process more difficult. Since it is required to be installed separately and adding to dependencies would not help.
EDIT: Official installation guide: https://pygraphviz.github.io/documentation/stable/install.html#windows
@khoda81 if you set from __future__ import annotations
that should fix it. But you'll need pygraphviz to be able to work on the tests anyway.
@caleb531 @khoda81 So I've updated the test cases and looks like everything is passing with coverage. I think this should be ready to start reviewing. A few notes:
show_None
attribute from the display for GNFA
, since it seemed kinda pointless to display a bunch of emptyset symbols everywhere.@eliotwrobson @khoda81 Sorry for the delay—I've had a busy week so far. A few points:
GNFA. show_None
is fine with me, and the rationale makes senseNow, there's a ton of code in this PR, so I don't have time to review all of it today. But on the surface (for what I've skimmed), it looks pretty good.
@caleb531 There is a slight problem with replacing graphviz with pygraphviz. I have a working installation of graphviz binary and graphviz python package. A simple pip install will not work on Windows because of build issues, therefore it needs to be installed with:
python -m pip install --global-option=build_ext `
--global-option="-IC:\Program Files\Graphviz\include" `
--global-option="-LC:\Program Files\Graphviz\lib" `
pygraphviz
Which as a result becomes part of the installation process of automata lib (on Windows). This is while the graphviz binary needs to be installed separately anyway. And I need to add, automata lib cannot be installed on Windows if we add pygraphviz as a dependency since the installation of pygraphviz would fail. Therefore pip will terminate the entire installation.
Since graphviz did not provide proper API for testing purposes, one solution would be to switch back to pydot. But that might need to remain for another PR.
@eliotwrobson I have tested optional dependencies and it did work for me. Though as mentioned above I needed to install pygraphviz before I did pip install -e .[visual]
.
@khoda81 I don't think the issue you're getting at isn't as big of a deal in light of the introduction of optional dependencies. Python packages that interface with other applications are always going to be harder to install across different operating systems (this is the reason why, on my windows machine, I used conda to set up pygraphviz). As long as the package works without having to install the optional dependencies, I think we're in good shape.
@caleb531 No worries on the review! I think I might see if I can run coveralls locally and see what happens to try to boost coverage while you're taking a look. WRT optional dependencies, I think that everything works (besides the display function) if visual dependencies are not installed because of the import guard I set up. I wonder if there's a way to set up automated testing for that.
EDIT: Actually, I'm not sure how to run coveralls locally...
@eliotwrobson
I'm not sure how to run coveralls locally...
Oh hmm, I can tell you how to run coverage
locally, but I've never tried to run Coveralls locally. It's strange though, because the CI job is just generating an lcov
file, which is then being fed to Coveralls. So I wonder if Coveralls is actually doing anything beyond parsing the coverage data you feed it.
WRT optional dependencies, I think that everything works (besides the display function) if visual dependencies are not installed because of the import guard I set up. I wonder if there's a way to set up automated testing for that.
I think this could be accomplished with a second CI job that executes only visualization-related tests with the additional dependencies.
@caleb531 ahh I see, I think coveralls is parsing the data correctly but something about the coverage configuration might be off. I wasn't able to get nose2 coverage running locally.
@caleb531 @khoda81 I've pushed changes https://github.com/caleb531/automata/pull/129/commits/10706a6406d9057a727e2ab49371a3f0b354c009 that fix the correctness issue I was seeing (at least when testing with the small example NFA), along with tests verifying this for that particular instance. There have been a few uncovered lines that have been exposed, but the test cases are checking for most of the important issues related to correctness.
The preceding issue was the last major obstacle to this PR. All that remains is to add a few test cases that ensure correctness and cover the last remaining lines. I should be able to attack some of these tests, but would greatly appreciate some help on this, as there are a few pesky uncovered lines that require some weird setup code to hit. I also want to be sure about the optimality of the new accepting path finding algorithm.
EDIT: After making another pass, all that's left is to test that everything works even with the optional imports.
@eliotwrobson @khoda81 At this point, I'm tempted to just bypass the slight drop in coverage (to address in a separate PR) for the sake of merging this PR. It will unblock #147 and any other future PRs that touch these files (which would often be the case). I've also been realizing that achieving a perfect 100% code coverage can be impractical or even unattainable, sometimes.
@eliotwrobson I think my only requirement for wrapping up this PR at this point would be to work through the remaining comment threads to reach resolution for all of them.
@caleb531 sounds like a plan. I closed all of the conversations related to changes that I already made. There are only 4 remaining ones open, and they're all fairly tangential or something that we can resolve with testing later (minor stuff related to how the project is configured). I'll make another pass.
@caleb531 I've resolved all of the open comment threads on this PR. The only dangling items are:
Other than those, we have very high coverage and tests for most everything that's been added. Once you give the code here a quick lookover / are comfortable, I think we're good to go ahead and merge!
Changes in this PR
Examples of visual changes
More Example images
images.zip The ones not included in the old folder resulted in error when generating.
Note: Tests for the pydot graph need to be updated. I am waiting for these new visualizations to be approved and finalized first.