dlclark / regexp2

A full-featured regex engine in pure Go based on the .NET engine
MIT License
976 stars 81 forks source link

Line Terminator (Dollar sign) does not match as expected #73

Closed ekacoei closed 9 months ago

ekacoei commented 9 months ago

Hi there,

I am running on advent-of-code input, it is a leetcode excercise and you are expected to extract a few numbers from a pile of ascii garbage. I expect a PCRE engine to match $ as either end-of-line or end-of-file. Adding a Dollar sign $ denies me any match.

Sample input:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

RegEx (extract first and last digit of any line, first and last digit may be the same) ^\D*(\d).*(\d)\D*$|((\d))

Expected output

$ echo '1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
' |grep -P --color=auto '^\D*(\d).*(\d)\D*$|((\d))'
**1abc2**
**pqr3stu8vwx**
**a1b2c3d4e5f**
treb**7**uchet

Sample Golang Programm to illustrate the issue

package main

import "github.com/dlclark/regexp2"
import "fmt"

var puzzleinput string = `1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet`

func main(){

//  re:= regexp2.MustCompile(`^\D*(\d).*(\d)\D*$|((\d))`,0) //matches incorrectly, match result "1"
  re:= regexp2.MustCompile(`^\D*(\d).*(\d)\D*||((\d))`,0) //matches as expected
  if m, _ := re.FindStringMatch( puzzleinput); m!=nil {
//    fmt.Printf(m.Groups()[0].Captures[0].String(),m.Groups()[0].Captures[1].String())
    fmt.Printf(m.String())
  }
}
dlclark commented 9 months ago

There are a couple errors in this code:

  1. Just line in PCRE, if you want $ to match end of line or end of string you need to use the Multiline option (play around with this on https://regex101.com/r/1NSmAJ/1) . I believe Grep always splits on newline and processes each line as a separate string so this doesn't come up.
  2. Your pattern will need to match multiple times (once per line), so you need to loop through the matches to see them all.

Here's the fixed code:

var puzzleinput string = `1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet`
re := regexp2.MustCompile(`^\D*(\d).*(\d)\D*$|((\d))`, regexp2.Multiline)

for m, _ := re.FindStringMatch(puzzleinput); m != nil; m, _ = re.FindNextMatch(m) {
    fmt.Printf("Match: %v\n", m.String())
}