dlclark / regexp2

A full-featured regex engine in pure Go based on the .NET engine
MIT License
974 stars 81 forks source link

Is there any workaround for `split`? #85

Open i-am-the-slime opened 1 month ago

i-am-the-slime commented 1 month ago

Thanks for this nice library!

I'm using this library from another language that can compile to Golang. I've now finally hit the case where I use a library that needs split on regex. You mention in the README that this you're still working on this. Do you happen to have a draft or other unfinished code that can do some splitting (maybe slow, maybe wrong in edge cases)?

dlclark commented 1 month ago

I had written a split function (based on C#) for the code-gen version of the library. I suspect it'll work with the main version as well, but there are probably edge cases:

// Split splits the given input string using the pattern and returns
// a slice of the parts. Count limits the number of matches to process.
// If Count is -1, then it will process the input fully.
// If Count is 0, returns nil. If Count is 1, returns the original input.
// The only expected error is a Timeout, if it's set.
//
// If capturing parentheses are used in the Regex expression, any captured
// text is included in the resulting string array
// For example, a pattern of "-" Split("a-b") will return ["a", "b"]
// but a pattern with "(-)" Split ("a-b") will return ["a", "-", "b"]
func (re *Regexp) Split(input string, count int) ([]string, error) {
    if count < -1 {
        return nil, errors.New("count too small")
    }
    if count == 0 {
        return nil, nil
    }
    if count == 1 {
        return []string{input}, nil
    }
    if count == -1 {
        // no limit
        count = math.MaxInt64
    }

    // iterate through the matches
    priorIndex := 0
    var retVal []string
    var txt []rune

    m, err := re.FindStringMatch(input)

    for ; m != nil && count > 0; m, err = re.FindNextMatch(m) {
        txt = m.text
        // if we have an m, we don't have an err
        // append our match
        retVal = append(retVal, string(txt[priorIndex:m.Index]))
        // append any capture groups, skipping group 0
        gs := m.Groups()
        for i := 1; i < len(gs); i++ {
            retVal = append(retVal, gs[i].String())
        }
        priorIndex = m.Index + m.Length
        count--
    }

    if err != nil {
        return nil, err
    }

    if txt == nil {
        // we never matched, return the original string
        return []string{input}, nil
    }

    // append our remainder
    retVal = append(retVal, string(txt[priorIndex:]))

    return retVal, nil
}

It uses the m.txt private field, but I'm sure it could be written without it for your purposes. Let me know if you run into any issues. I could look at adding this to the main library version.

dlclark commented 4 weeks ago

@i-am-the-slime did this help?