flier / gohs

GoLang Binding of HyperScan https://www.hyperscan.io/
Other
280 stars 51 forks source link

Incorrect match result on high load #42

Closed ivchcm closed 1 year ago

ivchcm commented 2 years ago

We are using the Hyperscan on the high load system (up to 50k rps)

The version of the gohs lib is 1.1.0 OS Ubuntu 20.04 libhyperscan5 5.2.1-1build1

For matching the patterns we use the VectorDatabase and on the local PC without any load (unit tests) everything works as expected But on the prod environment we're having an issue - for the equals match we're getting the false result on the completely correct string (double-checked with the simple regexp match) and for some reason, we're getting the true match for the string, that we don't have in any of our patters.

The code is smth like that:

patterns := make([]*hyperscan.Pattern, len(patternStrings))
for i, s := range patternStrings {
    patterns[i] = hyperscan.NewPattern("^somePatternString$", hyperscan.Utf8Mode|hyperscan.SingleMatch)
    patterns[i].Id = i + 1
}
database, err := hyperscan.NewVectoredDatabase(patterns...)
if err != nil {
    return nil, err
}

scratch, err := hyperscan.NewScratch(database)
if err != nil {
    database.Close()
    return nil, err
}

And the scan is like

result := false
err := database.Scan([][]byte{[]byte("somePatternString")}, scratch, func(id uint, from, to uint64, flags uint, context interface{}) error {
    result = true
    return nil
}

And I'm getting false results here (the match handler callback isn't called). Also, I'm getting the true result for the completely different input string. And what is more confusing - I'm getting the true result for only one different string. For example: if I have string1, string2, string4, and string5 and pattern equals is created for the string1 I'm getting the true match for string2 and never for any other string

flier commented 2 years ago

First, Scratch is not thread safe, please ensure you have right Scratch for the concurrent scenes.

While the Hyperscan library is re-entrant, the use of scratch spaces is not. For example, if by design it is deemed necessary to run recursive or nested scanning (say, from the match callback function), then an additional scratch space is required for that context.

Can you reproduce your issue with a simple example? Thanks