BurntSushi / regex-automata

A low level regular expression library that uses deterministic finite automata.
The Unlicense
351 stars 26 forks source link

API for accessing to DFA internals (transition/states/classes/...) #29

Closed thomcc closed 1 year ago

thomcc commented 1 year ago

I'm hoping[^1] to use regex-automata to build/minimize DFAs that I then try and compile into a shift-based DFA, after (optionally) passing things through some hacky z3 code to see if it's possible to find state and transition values to reduce the table size (or specifically the row size).

[^1]: At the moment I have some very hacky code that does it to a hand-written automata (and to be fair real usage probably would want to do this anyway), but it would be more useful if it could be more generalized.

I don't think there's a way to do this using the public API at the moment — I need access to several parts of the DFA internals, such as states, transition tables, and possibly byte classes. this seems like the kind of thing that either you'd probably have no interest in supporting, or would have strong on how it's exposed, so I'm filing an issue first (it is admittedly niche, and could end up limiting the internal design too much).

An API like this would have other use cases too; two that spring to mind would be compiling DFAs to rust/c/asm, and futher debugging features — allowing regex-cli to print dot code for an automata seems like it might be nice.

(I imagine it also isn't specifically limited to DFAs, but my use case is)

BurntSushi commented 1 year ago

On mobile, but if I were to generically answer your question, I think I would say that all of that is already exposed. For example, Automaton::next_state is the transition table manifest as a transition function. And DFA::alphabet_len should tell you everything you need to know about equivalence classes I think? (Perhaps unless you need the actual mapping.)

So I guess, could you say in more detail what exactly you need?

Also, I am generally more open to exposing things in this crate as I consider it to be an "expert API." But I still want to present a coherent picture and not hamstring API flexibility too much. Exposing the byte classes, for example, doesn't strike me as an issue at all.

thomcc commented 1 year ago

Ah, let me look into that. It's possible I just missed it. Thanks.

BurntSushi commented 1 year ago

(I also have shift DFAs on my TODO list to experiment with and potentially add to this crate, but I've put it beyond the scope of getting the next release out.)

thomcc commented 1 year ago

But I still want to present a coherent picture and not hamstring API flexibility too much

For sure, that's why I filed an issue.

Automaton::next_state is the transition table manifest as a transition function. And DFA::alphabet_len should tell you everything you need to know about equivalence classes I think

Yeah this is awkward, but seems like it might be enough. I probably need the byte classes, but can get on without them for now.

I also have shift DFAs on my TODO list to experiment with and potentially add to this crate

They're pretty cool for the cases where they're usable for sure — been very impressed by the perf so far (faster than I'd expect given that it's still scalar).

BurntSushi commented 1 year ago

Closing this because it seems like existing APIs might be enough.

I am open to exposing more. Such as, for example, a list of state IDs or perhaps a list of State where State is some new public type. Concrete use cases would help.

For further discussion, please open issues on the regex crate repo, as regex-automata has moved there. :)