golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.95k stars 17.66k forks source link

regexp/syntax: character class negation is slow #69303

Open func25 opened 2 months ago

func25 commented 2 months ago

I've submitted a benchmark and a fix at https://github.com/golang/go/pull/69304.

New benchmark:

goos: darwin
goarch: arm64
pkg: regexp/syntax
cpu: Apple M1
BenchmarkString/^(.*);$|^;(.*)-8                 4972814               232.1 ns/op            56 B/op          3 allocs/op
BenchmarkString/(foo|bar$)x*-8                   5602014               212.6 ns/op            56 B/op          3 allocs/op
BenchmarkString/[^=,]-8                         21872083                53.84 ns/op            8 B/op          1 allocs/op
BenchmarkString/([^=,]+)=([^=,]+)-8              5726475               207.3 ns/op            56 B/op          3 allocs/op
BenchmarkString/([^=,]+)=([^=,]+),.*-8           4588252               259.2 ns/op            56 B/op          3 allocs/op
PASS
ok      regexp/syntax   7.603s

Go version

1.23 darwin/arm64

What did you do?

I received a report at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6911 and ran a benchmark for Regexp.String().

What did you see happen?

Some regexes are extremely slow, even when they're simple. The negate [^] causes calcFlags to run over a large character space to find a fold case.

goos: darwin
goarch: arm64
pkg: regexp/syntax
cpu: Apple M1
BenchmarkString/^(.*);$|^;(.*)-8                 4594401               253.2 ns/op            56 B/op          3 allocs/op
BenchmarkString/(foo|bar$)x*-8                   5006730               236.1 ns/op            56 B/op          3 allocs/op
BenchmarkString/[^=,]-8                              256           4227434 ns/op               8 B/op          1 allocs/op
BenchmarkString/([^=,]+)=([^=,]+)-8                  151           8032660 ns/op              56 B/op          3 allocs/op
BenchmarkString/([^=,]+)=([^=,]+),.*-8               146           8095255 ns/op              56 B/op          3 allocs/op
PASS
ok      regexp/syntax   9.011s
gabyhelp commented 2 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

gopherbot commented 2 months ago

Change https://go.dev/cl/611300 mentions this issue: regexp/syntax: Optimize calcFlags() for OpCharClass fold case