diprism / perpl

The PERPL Compiler
MIT License
10 stars 5 forks source link

Regular expression example (robust mu-types) #44

Open davidweichiang opened 2 years ago

davidweichiang commented 2 years ago

If we define a type for regexps and a matching function

data String = Nil | Cons Symbol
data Regexp = Atom Symbol | Union Regexp Regexp | Concat Regexp Regexp | Star Regexp
define match' re w =
  case re of
    Atom a => case w of Nil => fail | Cons b w => if a = b then w else fail
    Union re1 re2 => amb (match' re1 w) (match' re2 w)
    Concat re1 re2 => let w in match' re1 w in match' re2 w
    Star re1 => let w = match' re1 w in match' (Star re1) w
define match re w = case match' re w of Nil => () | Cons _ _ => fail

Defunctionalizing Regexp would automatically convert it into a finite automaton, which would be awesome. But it seems unavoidable that the Star case copies re1. But if re is deterministic, it seems like it should be fine for it to be copyable. Can we relax the rules without making them even more complicated?

davidweichiang commented 2 years ago

The problem case is

define r = Star (Atom (amb A B))

Should this match A B or not? In other words, should fold evaluate its argument before doing the fold, or should it be lazy?

davidweichiang commented 2 years ago

Example with fixed regexp:

match (Star (Atom A)) (Cons A Nil)

Then Regexp would defunctionalize to:

data RegexpF = R | R1
data Regexp = Atom Symbol | Union RegexpF RegexpF | ...
define unfoldRegexp re = case re of R -> Star R1 | R1 -> Atom A

Now unfoldRegexp is positive/robust, and copying it is ok.

davidweichiang commented 2 years ago

Example with random regexp:

match (Star (amb (Atom A) (Atom B))) (Cons A (Cons B Nil))

Defunctionalize:

data RegexpF = R | R1A | R1B
data Regexp = Atom Symbol | Union RegexpF RegexpF | ...
define unfoldRegexp re = case re of R -> Star (amb R1A R1B) | R1A -> Atom A | R1B -> Atom B

IMO the program should always return False, but after defunctionalization it can return True.

davidweichiang commented 2 years ago

What happens if we write a copyRegexp function by hand?

ccshan commented 2 years ago
copyRegexp re =
  case re of
    Star re' -> let (re1, re2) = copyRegexp re' in (Star re1, Star re2)
    Atom a -> let (a1, a2) = copyAtom a in (Atom a1, Atom a2)
ccshan commented 2 years ago

Do we need to write

copyAtom a =
  case a of A -> (A,A)
            B -> (B,B)
davidweichiang commented 11 months ago

If this problem could be overcome, I would expect that an extension for bounded repetition (https://dl.acm.org/doi/pdf/10.1145/3586044) would come for free.