Closed EduardoGoulart1 closed 1 year ago
I'm currently working on finalizing this PR and I would like to hear your opinions before proceeding.
When converting from partial to complete DFAs, it is necessary to add a trap state. So far I was using None for this purpose, but it doesn't work well with networkx.DiGraph because that library doesn't allow nodes labeled None in the graph. It is not too difficult to work around this problem for the DFA class, but it becomes problematic when calling NFA.to_dfa (or other automata types) as this invariant would need to propagate throughout the code. So I need another approach to generate new trap states.
The only solution I can think of that does not require significant refactoring, is to assign the highest negative integer number that isn't already a state. This solution works well, but sometimes states are strings or frozen sets, which leads to a typing mismatch that isn't directly caused by the user. To some extent, this can be mitigated on our side, but I don't think that we can completely avoid it.
The more radical solution would be to make all DFAs partial. That would simplify a lot of the code, but would cost some performance for "dense" DFAs
@EduardoGoulart1 Regarding mixed state types, that's a fair question. @eliotwrobson I think we've updated the library to handle, for example, states of mixed types, correct? I believe some existing methods (maybe DFA.minify
?) produce machines with such mixed state types.
If this is the case, then I'm fine with the highest negative integer number for the trap state name.
@caleb531 yes, the library can handle mixed state types, and there are a couple of places this is used I think (see from_finite_language
). The type annotations are also compatible with mixed-type states. @EduardoGoulart1 I think that's a fine convention, as there are a few places where we start assigning integer state names starting from 0 (in the regex code I believe).
Also, adapting the test is fairly easy, but some tests have hard-coded values for the expected result which for me does not make sense and complicates reusing it for testing partial DFAs. For instance the test below (test_union
) is explicitly testing the union operator by constructing the DFA representing the result.
Is there a specific reason for that? I would have expected the test to verify its algebraic properties. For example something like A \subseteq (A \cup B) && B \subseteq (A \cub B) && (A \cub B) - A - B == \emptyset
. If there is no such reason, then I will proceed to update such tests (this will also remove many lines of code)
@EduardoGoulart1 the reason for these is to verify the state names and types of the output (which when names are retained, is part of the API). These test cases should be kept with their hard-coded output. There actually is a test case testing for algebraic properties separately.
I'm not sure about the best way to test with allow partial for these. Is there a way to use parameterized test cases (like in pytest) with nose? The DFAs in those test cases are not partial anyway, so I don't think they can really be meaningfully adapted to test for the new behavior in this PR.
@eliotwrobson @EduardoGoulart1 It does look like nose2 supports test parameterization! Although since DFAs can be verbose to construct, I wonder if the decorator signature would get pretty large. But please feel free to play around to find something practical and maintainable:
@caleb531 @eliotwrobson sure my initial idea was to extend the tests to use dfa.as_partial(). The only places where this does not work is for such hard-coded tests. The problem is that we do not expand all the states. But I will try my best with these constraints in mind :D
@EduardoGoulart1 just a heads up that, because of the size of the refactor and some weirdness that was uncovered, I want to wait until #129 is merged before merging this. I fixed the last major blocker there, so hopefully that will happen soon.
@EduardoGoulart1 @eliotwrobson Just merged #129! I'm working on resolving the merge conflicts now.
@EduardoGoulart1 Some merge conflicts came up when #152 was merged. I've resolved those and gotten the lint check passing in the latest push (with some minor style changes and optimizations). Please pull when you get the chance.
@eliotwrobson @EduardoGoulart1 What's outstanding for this ticket before it can be reviewed/merged? I believe this is the last major piece that is necessary for the v8 release.
@caleb531 the changes I made were pretty minor on top of what was here to start. Per previous comments, I think all that's left is to write tests for everything. Not sure how far along @EduardoGoulart1 was with that.
Yes, basically the tests... I stopped halfway through them because it's quite a lot to go over :sweat:
I stopped halfway through them because it's quite a lot to go over
@EduardoGoulart1 could you push what you have (even if partially completed)? This should speed up the review process at least somewhat.
@EduardoGoulart1 Can you please make sure you've pushed up your latest work so @eliotwrobson can finish the tests? I'd love to get this reviewed/merged soon.
@EduardoGoulart1 @caleb531 in the latest push, I streamlined and update the test cases to incorporate partial DFAs, including tests that check for interoperability with partial and non-partial DFAs. This bought the line coverage up to 100% in the DFA file.
There are a couple of minor things left to do. The main thing is double checking over the test suite to ensure we are testing extensively enough for the partial behavior. The other thing is doing a pass over the DFA code to clean things up as much as possible. There were places where I removed a large amount of code, but I may have missed some things. Finally, we need to update the documentation.
Even though I'm not quite ready to call this done, I think this is in a mature enough state that we can start reviewing, as it should be mostly code complete at this point.
@EduardoGoulart1 @caleb531 in the latest push, I streamlined and update the test cases to incorporate partial DFAs, including tests that check for interoperability with partial and non-partial DFAs. This bought the line coverage up to 100% in the DFA file.
There are a couple of minor things left to do. The main thing is double checking over the test suite to ensure we are testing extensively enough for the partial behavior. The other thing is doing a pass over the DFA code to clean things up as much as possible. There were places where I removed a large amount of code, but I may have missed some things. Finally, we need to update the documentation.
Even though I'm not quite ready to call this done, I think this is in a mature enough state that we can start reviewing, as it should be mostly code complete at this point.
Does it make sense to start over with a "clean" PR? Now I am a bit lost in the middle of all the recent updates
@EduardoGoulart1 I don't think it makes sense to make a new one, but you'll probably have to review the entire current set of changes instead of any one commit.
Just had a quick look over it. I would say it's in a state to start reviewing it, so I will just drop my review... You can drop yours and we can go over it I guess. But who is in charge of applying the changes? Me?
But who is in charge of applying the changes?
Depending on the number of changes, I can probably apply them
Since there are only minor comments from my side, and I don't want to block progress because of my limited time, I am good with whatever decisions you meet here so we can merge it (I can't approve my own PR). Sorry for leaving the work incomplete, but hope it was still of some help
I've worked through a few of the comment threads, closing the ones that are taken care of, and commenting on a few others.
@EduardoGoulart1 @eliotwrobson Would it be possible for you guys to work through the outstanding comment threads? I would personally prefer to review this PR only once these comment threads have been resolved, and I don't have anything more to say on the ones that are still open.
Would it be possible for you guys to work through the outstanding comment threads? I would personally prefer to review this PR only once these comment threads have been resolved, and I don't have anything more to say on the ones that are still open.
@caleb531 sounds good, I'm resolving some now and the remaining threads are fairly short.
@caleb531 all items are resolved and removed all TODOs. The only thing left is docs changes, but I'll leave that until after your review. Thanks!
@EduardoGoulart1 @eliotwrobson This looks good to me! Will merge now.
Thank you both for all the work on this PR.
@eliotwrobson @caleb531 Please do not review it because the code is not ready yet. This is just a draft to book progress on the allow_partial support.
If you go with the definition that DFAs are sets of words, then all operations are well-defined and relatively easy to implement. Most of the times the only required change is to replace loops like:
By the code:
I made the code such that it adds no overhead for complete DFAs compared to the current implementation. For partial DFAs, if the DFAs are sparse, then you will get a large performance boost, while if they are dense you will pay some penalty (but most of the times negligible).
I feel like we still need to discuss a few things:
Reserve something (probably None) to represent the trap state. Basically, we add the invariant that if None is part of the set of states, then it must be a trap state. This is already implicitly assumed in. Because network graph does not allow for states labeled None, we replaced it with automatically generated integers_get_next_current_state
and greatly simplifies the implementation logicMissing: