Proper support for allow_partial

EduardoGoulart1 commented 1 year ago

@eliotwrobson @caleb531 Please do not review it because the code is not ready yet. This is just a draft to book progress on the allow_partial support.

If you go with the definition that DFAs are sets of words, then all operations are well-defined and relatively easy to implement. Most of the times the only required change is to replace loops like:

for symbol in input_symbols:
    tgt_a = transitions_a[symbol]

By the code:

for symbol in input_symbols:
    tgt_a = transitions_a.get(tgt_a, None)

I made the code such that it adds no overhead for complete DFAs compared to the current implementation. For partial DFAs, if the DFAs are sparse, then you will get a large performance boost, while if they are dense you will pay some penalty (but most of the times negligible).

I feel like we still need to discuss a few things:

Reserve something (probably None) to represent the trap state. Basically, we add the invariant that if None is part of the set of states, then it must be a trap state. This is already implicitly assumed in _get_next_current_state and greatly simplifies the implementation logic. Because network graph does not allow for states labeled None, we replaced it with automatically generated integers
We could think of making this allow_partial thing invisible to the users. Instead of passing allow_partial as a parameter, we compute internally the flag "is_partial". I go by the principle that we should not restrict our API unless it would lead to misuse.
Now that support for partial DFAs is there, it is straightforward to extend operations to support DFAs with different alphabets

Missing:

[ ] Adjust/add tests
[ ] Add benchmarks

EduardoGoulart1 commented 1 year ago

I'm currently working on finalizing this PR and I would like to hear your opinions before proceeding.

When converting from partial to complete DFAs, it is necessary to add a trap state. So far I was using None for this purpose, but it doesn't work well with networkx.DiGraph because that library doesn't allow nodes labeled None in the graph. It is not too difficult to work around this problem for the DFA class, but it becomes problematic when calling NFA.to_dfa (or other automata types) as this invariant would need to propagate throughout the code. So I need another approach to generate new trap states.

The only solution I can think of that does not require significant refactoring, is to assign the highest negative integer number that isn't already a state. This solution works well, but sometimes states are strings or frozen sets, which leads to a typing mismatch that isn't directly caused by the user. To some extent, this can be mitigated on our side, but I don't think that we can completely avoid it.

The more radical solution would be to make all DFAs partial. That would simplify a lot of the code, but would cost some performance for "dense" DFAs

caleb531 commented 1 year ago

@EduardoGoulart1 Regarding mixed state types, that's a fair question. @eliotwrobson I think we've updated the library to handle, for example, states of mixed types, correct? I believe some existing methods (maybe DFA.minify?) produce machines with such mixed state types.

If this is the case, then I'm fine with the highest negative integer number for the trap state name.

eliotwrobson commented 1 year ago

@caleb531 yes, the library can handle mixed state types, and there are a couple of places this is used I think (see from_finite_language). The type annotations are also compatible with mixed-type states. @EduardoGoulart1 I think that's a fine convention, as there are a few places where we start assigning integer state names starting from 0 (in the regex code I believe).

coveralls commented 1 year ago

coverage: 99.812% (-0.07%) from 99.883% when pulling 125d244cb17416a712bf922863257e1d0d52d7fe on EduardoGoulart1:proper_support_allow_partial into f1c3674b90856e179a73779570bae7c86c5d7844 on caleb531:develop.

EduardoGoulart1 commented 1 year ago

Also, adapting the test is fairly easy, but some tests have hard-coded values for the expected result which for me does not make sense and complicates reusing it for testing partial DFAs. For instance the test below (test_union) is explicitly testing the union operator by constructing the DFA representing the result.

Is there a specific reason for that? I would have expected the test to verify its algebraic properties. For example something like A \subseteq (A \cup B) && B \subseteq (A \cub B) && (A \cub B) - A - B == \emptyset. If there is no such reason, then I will proceed to update such tests (this will also remove many lines of code)

eliotwrobson commented 1 year ago

@EduardoGoulart1 the reason for these is to verify the state names and types of the output (which when names are retained, is part of the API). These test cases should be kept with their hard-coded output. There actually is a test case testing for algebraic properties separately.

I'm not sure about the best way to test with allow partial for these. Is there a way to use parameterized test cases (like in pytest) with nose? The DFAs in those test cases are not partial anyway, so I don't think they can really be meaningfully adapted to test for the new behavior in this PR.

caleb531 commented 1 year ago

@eliotwrobson @EduardoGoulart1 It does look like nose2 supports test parameterization! Although since DFAs can be verbose to construct, I wonder if the decorator signature would get pretty large. But please feel free to play around to find something practical and maintainable:

https://docs.nose2.io/en/latest/params.html

EduardoGoulart1 commented 1 year ago

@caleb531 @eliotwrobson sure my initial idea was to extend the tests to use dfa.as_partial(). The only places where this does not work is for such hard-coded tests. The problem is that we do not expand all the states. But I will try my best with these constraints in mind :D

eliotwrobson commented 1 year ago

@EduardoGoulart1 just a heads up that, because of the size of the refactor and some weirdness that was uncovered, I want to wait until #129 is merged before merging this. I fixed the last major blocker there, so hopefully that will happen soon.

caleb531 commented 1 year ago

@EduardoGoulart1 @eliotwrobson Just merged #129! I'm working on resolving the merge conflicts now.

eliotwrobson commented 1 year ago

@EduardoGoulart1 Some merge conflicts came up when #152 was merged. I've resolved those and gotten the lint check passing in the latest push (with some minor style changes and optimizations). Please pull when you get the chance.

caleb531 commented 1 year ago

@eliotwrobson @EduardoGoulart1 What's outstanding for this ticket before it can be reviewed/merged? I believe this is the last major piece that is necessary for the v8 release.

eliotwrobson commented 1 year ago

@caleb531 the changes I made were pretty minor on top of what was here to start. Per previous comments, I think all that's left is to write tests for everything. Not sure how far along @EduardoGoulart1 was with that.

EduardoGoulart1 commented 1 year ago

Yes, basically the tests... I stopped halfway through them because it's quite a lot to go over :sweat:

eliotwrobson commented 1 year ago

I stopped halfway through them because it's quite a lot to go over

@EduardoGoulart1 could you push what you have (even if partially completed)? This should speed up the review process at least somewhat.

caleb531 commented 1 year ago

@EduardoGoulart1 Can you please make sure you've pushed up your latest work so @eliotwrobson can finish the tests? I'd love to get this reviewed/merged soon.

eliotwrobson commented 1 year ago

@EduardoGoulart1 @caleb531 in the latest push, I streamlined and update the test cases to incorporate partial DFAs, including tests that check for interoperability with partial and non-partial DFAs. This bought the line coverage up to 100% in the DFA file.

There are a couple of minor things left to do. The main thing is double checking over the test suite to ensure we are testing extensively enough for the partial behavior. The other thing is doing a pass over the DFA code to clean things up as much as possible. There were places where I removed a large amount of code, but I may have missed some things. Finally, we need to update the documentation.

Even though I'm not quite ready to call this done, I think this is in a mature enough state that we can start reviewing, as it should be mostly code complete at this point.

EduardoGoulart1 commented 1 year ago

@EduardoGoulart1 @caleb531 in the latest push, I streamlined and update the test cases to incorporate partial DFAs, including tests that check for interoperability with partial and non-partial DFAs. This bought the line coverage up to 100% in the DFA file.

There are a couple of minor things left to do. The main thing is double checking over the test suite to ensure we are testing extensively enough for the partial behavior. The other thing is doing a pass over the DFA code to clean things up as much as possible. There were places where I removed a large amount of code, but I may have missed some things. Finally, we need to update the documentation.

Even though I'm not quite ready to call this done, I think this is in a mature enough state that we can start reviewing, as it should be mostly code complete at this point.

Does it make sense to start over with a "clean" PR? Now I am a bit lost in the middle of all the recent updates

eliotwrobson commented 1 year ago

@EduardoGoulart1 I don't think it makes sense to make a new one, but you'll probably have to review the entire current set of changes instead of any one commit.

EduardoGoulart1 commented 1 year ago

Just had a quick look over it. I would say it's in a state to start reviewing it, so I will just drop my review... You can drop yours and we can go over it I guess. But who is in charge of applying the changes? Me?

eliotwrobson commented 1 year ago

But who is in charge of applying the changes?

Depending on the number of changes, I can probably apply them

EduardoGoulart1 commented 1 year ago

Since there are only minor comments from my side, and I don't want to block progress because of my limited time, I am good with whatever decisions you meet here so we can merge it (I can't approve my own PR). Sorry for leaving the work incomplete, but hope it was still of some help

caleb531 commented 1 year ago

I've worked through a few of the comment threads, closing the ones that are taken care of, and commenting on a few others.

@EduardoGoulart1 @eliotwrobson Would it be possible for you guys to work through the outstanding comment threads? I would personally prefer to review this PR only once these comment threads have been resolved, and I don't have anything more to say on the ones that are still open.

eliotwrobson commented 1 year ago

Would it be possible for you guys to work through the outstanding comment threads? I would personally prefer to review this PR only once these comment threads have been resolved, and I don't have anything more to say on the ones that are still open.

@caleb531 sounds good, I'm resolving some now and the remaining threads are fairly short.

eliotwrobson commented 1 year ago

@caleb531 all items are resolved and removed all TODOs. The only thing left is docs changes, but I'll leave that until after your review. Thanks!

caleb531 commented 1 year ago

@EduardoGoulart1 @eliotwrobson This looks good to me! Will merge now.

Thank you both for all the work on this PR.

caleb531 / automata

Proper support for allow_partial #147