Open jeffrey82221 opened 1 year ago
Ooooh that's an interesting feature! I think it's definitely possible. We can define custom dunder methods like __or__
and __and__
and __contains__
for the syntax you suggested. We'd also need some type checking in these methods because you can't union the patterns abc+
and [a-z]
because the former isn't a really a set. What would the output of Range("1", "5") | Range("8", "10")
be? Seeing as there's a gap interval between 5 and 8 that we'd have to exclude. Secondly, would those behavior only apply to Range
? If not what would you like the behavior to be for other classes?
Let's first focus on the __or__
operator, since it can help simplify the regex and make regex more generalize.
My suggestion is that we can build up rules for those regex pairs that can intuitively merged together and simply return Or
or Set
regex class for those that is difficult to merge.
For the cases you mentioned about:
A: 'abc+'; B: r'[a-z]' -> A | B = ?
-> A is a patterns ab
follows by more than one c
. B is a-z.
-> A and B are two entirely different patterns
-> Result is Or('abc+', '[a-z]')
A: Range("1", "5"); B: Range("8", "9") -> A | B = ?
-> A is 1-5, B is 8-9.
-> A and B does not overlap
-> Result is Set(Range("1", "5"), Range("8", "9"))
Of course, we need to specify the behavior between all the classes. It can be hard to elaborate, but let me try to explain my first idea.
We can first categorize the regex classes into char-level classes such as Range
, Set
, NotSet
, and Special Chars (Any
, WORD
, DIGIT
, ...) and compositional classes such as Amount
, Multi
, Optional
, Or
.
For the char-level classes, we can do the A | B = C operation in the following steps:
Range
regex. Range
and special chars with Set
.For the compositional regex, Amount
, Multi
, Optional
.
For Or
, return Or
, too.
Currently, I have no idea for the advance classes that inherent Extension
. (e.g., NamedGroup)
Sounds good. I'll review your PR when you're finished. I'll also add set operations for the advanced classes too before that. 👍
Hi @GrandMoff100 ,
I would like to suggest a new feature for this package. Since a regex is a union of multiple patterns, it can be considered a
set
. How about building union (or) / intersect (and) / is subset of (in) operators to each regex classes?For example:
I am trying to build a regex inference engine. This feature will be beneficial for organizing the regex(es) when the size of the regex object list is large. I wonder if it's possible to add this feature? I am willing to make this contribute.