expr-lang / expr

Expression language and expression evaluation for Go
https://expr-lang.org
MIT License
6.3k stars 404 forks source link

Enable access to matched parts of a regular expression #229

Open vincentbernat opened 2 years ago

vincentbernat commented 2 years ago

Hey!

It would be nice to be able to access the parts matching a regex when using the matches operator. The captured parts could be assigned to some variables ($1, $2, etc) or to a special matched map indexed by the index and names of matched parts. All this could be done in a provided function, but putting it in the language allows one to get better performance as the regular expressions can be compiled at compile-time instead of at executing time.

antonmedv commented 2 years ago

regular expressions can be compiled at compile-time instead of at executing time.

This is also really easy to do on user side: see ConstExpr

antonmedv commented 2 years ago

What about adding regexp() builtin? It can be used something like this:

regexp("f.+").FindAllString(str, -1)
vincentbernat commented 2 years ago

As an example, I have a function ClassifySite("something") and a variant ClassifySiteRegex(Exporter.Name, "^([^-]+)-", "$1"). To leverage ConstExpr, I would need the user to wrap the regex into a function, so it would be inconvenient.

As for the builtin you propose, I suppose it returns an array. I was hoping for something a little more magic as it would mean to write something like this:

ClassifySite(regexp("^([^-]+)-").FindAllString(Exporter.Name)[0])

And what happens if there is no match? I would prefer something like:

Exporter.Name matches "^([^-]+)-" && ClassifySite($1)

But I would understand that you don't like such magic variables as they make the language non-pure.

antonmedv commented 2 years ago
Exporter.Name matches "^([^-]+)-" && ClassifySite($1)

I actually like this idea! Neat! I think it’s understandable what is going on here.

vincentbernat commented 2 years ago

Oh, great! It's how it is in Perl (I think, I don't remember exactly).

meharo commented 8 months ago

Sorry to bump the old thread, here is a way to do this by extending expr.

// Declare a global cache for compiled regex (optional).
// Proctect it using lock as concurrent reads/writes may happen.
var (
        compiledRegex = make(map[string]*regexp.Regexp)
        mutex         sync.RWMutex
)

// This function may be called concurrently but it's local vars are safe.
func myFunc() {
        // Build the env map
        env := make(map[string]interface{})

        // reMatch holds the captured groups by regex if any.
        reMatch := make([]string, 0)
        // reFind() holds the closure function with access to var "env". This access is needed as you can see below.
        // reFind() returns true if succeesfully captured any groups. Else, false.
        reFind := func(input string, pattern string) bool {
                mutex.RLock()
                regex, exists := compiledRegex[pattern]
                mutex.RUnlock()

                if !exists {
                        var err error
                        regex, err = regexp.Compile(pattern)
                        if err != nil {
                                log.Error("Regex compile error:", err)
                                return false
                        }

                        mutex.Lock()
                        compiledRegex[pattern] = regex
                        mutex.Unlock()
                }

                // we store the captured groups if any.
                matches := regex.FindStringSubmatch(input)

                // we overwrite our captured strings slice to env["reMatch"] so that we can access the matches like reMatch[0] inside the expression.
                env["reMatch"] = &matches

                if matches == nil {
                        return false
                }

                return true
        }

        // This is where we set the initial empty slice to env["reMatch"]. This can be overwritten by the reFind() later though.
        env["reMatch"] = &reMatch
        // we map the closure function reFind() to env["reFind"] so that it is accessible as reFind(input, 'regex_pattern') in the expression.
        env["reFind"] = reFind

        //Compile, cache it, and run. Or just run.
        compiled, err := expr.Compile(exprString, expr.Env(env))
        result, err := vm.Run(compiled, env)
}

Now the expression can be written like:

"reFind(input_string, '^(..)') ? reMatch[0] : 'unknown'"

reMatch is overwritten once reFind() is called. We may also call reFind() multiple times in the same expression. Access what you want from reMatch soon after each call.

antonmedv commented 8 months ago

Expr supports variables inside expressions now. They also can be used:

let matches = reFind(“…”); matches[0]
PranavPeshwe commented 8 months ago

TFS, @antonmedv . Where could I have learnt about this? Any non-obvious document or code-sample that I should keep an eye on, to know of such updates? Thanks.

antonmedv commented 8 months ago

I post all changes to https://github.com/expr-lang/expr/releases But I guess a dedicated blog post for release changes will be nice to have: https://expr-lang.org/blog

PS https://expr-lang.org/docs/language-definition#variables

amikai commented 5 days ago

Hi @antonmedv, I think adding regexp to the built-in is a great idea. Is there a plan for this?

if regexp("...") function returns a *Regexp object, then it allow users to utilize a variety of methods supported by the regexp package.

antonmedv commented 5 days ago

@amikai true. I'm thinking of adding this in the text release of the expr.