UnkindPartition / regex-applicative

Regex-based parsing with an applicative interface
MIT License
130 stars 12 forks source link

Render to regex syntax #36

Open chrisdone opened 5 years ago

chrisdone commented 5 years ago

I'm making a new form library like formlets, and considering the ability to embed a regex as a validator/parser which would run on the server side with regex-applicative upon submission, but also, as a UX optimization, compile the same regex to a [a-z]{2}+-style regex to be used as the pattern=".." attribute for html5 (avoiding a roundtrip just to do a simple text format check). This would enable a developer to build forms with client and server-side validation composed using the same regex-applicative recipe.

So the question is: is there already a library or module that would convert an RE s a to Text? E.g.

render (many (sym 'a')) => "a*"

I assume I could write my own, but perhaps it already exists.

Thanks!

UnkindPartition commented 5 years ago

Unfortunately, this is not possible because sym 'a' is translated to psym (== 'a'), and you cannot extract 'a' out of (== 'a').

chrisdone commented 5 years ago

@feuerbach Ah, ok. So it was worth asking. Thanks.

I suppose then I would need to wrap up the API with an interface which doesn't expose psym. E.g.

data Regex a

with a Regex a -> RE Char a conversion function. Whipping up a simple GADT that is the same as the RE one but without this capability should be trivial, and I can re-use the rest of the library.

If you're interested, I could send it as a PR against this package as e.g. Text.Regex.Applicative.Renderable or w/e, otherwise package it as regex-applicative-renderable or something. I'm sure I'll find other uses than just this form package, and the work is probably trivial.

UnkindPartition commented 5 years ago

Yeah, or maybe have a class containing sym and perhaps some other functions, and make both types instances of that class.

The other constructors it should probably support are:

  1. Recognizing a set (as in Data.Set.Set) of characters — can be simulated by summing individual syms, but could be implemented more efficiently.
  2. Recognizing a class (as in a POSIX regex class — alpha, alnum etc.) of characters — can in theory be implemented via individual syms, but in practice would be prohibitive.

This new module/representation might also support more efficient compilation, i.e. it's potential utility won't be restricted just to rendering. Let's call it Text.Regex.Applicative.Char?

chrisdone commented 5 years ago

OK, great then. When I get to this part of my lib I'll circle back to this issue. Thanks, Roman.

treeowl commented 5 years ago

A Set of characters will perform considerably worse, for almost all operations, than an IntMap of their equivalents.

On Sat, Sep 14, 2019, 4:23 PM Roman Cheplyaka notifications@github.com wrote:

Yeah, or maybe have a class containing sym and perhaps some other functions, and make both types instances of that class.

The other constructors it should probably support are:

  1. Recognizing a set (as in Data.Set.Set) of characters — can be simulated by summing individual syms, but could be implemented more efficiently.
  2. Recognizing a class (as in a POSIX regex class — alpha, alnum etc.) of characters — can in theory be implemented via individual syms, but in practice would be prohibitive.

This new module/representation might also support more efficient compilation, i.e. it's potential utility won't be restricted just to rendering. Let's call it Text.Regex.Applicative.Char?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/feuerbach/regex-applicative/issues/36?email_source=notifications&email_token=AAOOF7LLQYO3GZSCCMMTNE3QJVB3NA5CNFSM4IWRCSDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6XDKUQ#issuecomment-531510610, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7NXIJDAAYRNRC5ZRMTQJVB3NANCNFSM4IWRCSDA .

chrisdone commented 5 years ago

@treeowl do you have benchmarks for that?

ceedubs commented 4 years ago

I stumbled upon this discussion today. I had written a Scala regex library that a while ago I rewrote to more or less be a port of regex-applicative. The way I handled this in my library is that the the equivalent of the Symbol constructor (I call it Elem) has a separate metadata field of an abstract type m (which ends up needing to be another type parameter on RE).

For character (and any other discrete type) regexes, the m type ends up being a type that looks something like:

data Match a = Literal a | Wildcard | Allow (Diet a) | Forbid (Diet a)

where Diet is a discrete interval tree. This represents the notion of character classes in traditional regular expressions, supporting negation, unions and intersections. You can then write helper methods that implement the a -> maybe b function based on the Match.

In addition to being able to get a string representation of the regex, a nice benefit of having this metadata around is that you can then convert a regular expression to a Quickcheck generator that generates values that match that regex. I have a demo page here that I think is kind of fun.

Now is probably a good time to say thanks for creating this library, @feuerbach! As soon as I saw it I knew that I wanted to rewrite my Scala library :)

Edit: P.S. the polymorphic metadata field also makes a convenient place to throw a ThreadId into a tuple when compiling the regex so you don't have to have ThreadId baked into the user construction of the regex.

UnkindPartition commented 4 years ago

Thanks @ceedubs, those are some nice ideas.