LilyPad / GoLilyPad

GNU General Public License v3.0
98 stars 58 forks source link

regex is slow, lets stop using it. #45

Closed suedadam closed 4 years ago

suedadam commented 7 years ago

Old regex = 19374 ns/run with username “TestMe” regexreplace = 19 ns/run with username “TestMe” Tested with https://play.stackimpact.com/

Tzeentchful commented 7 years ago

Your test results look a bit skewed from running in a VM I wrote a small benchmark and ran it on my MBP and ran 3x faster than your results https://gist.github.com/Tzeentchful/78ade6f92ade5d7cf3c916b43cca809c

go test -bench=.
BenchmarkRegex-8          200000              5627 ns/op
BenchmarkLoop-8         100000000               12.9 ns/op

On a Digital Ocean droplet it tells a different story

go test -bench=.
BenchmarkRegex    200000         14510 ns/op
BenchmarkLoop   50000000            63.5 ns/op

Even taking the worst case scenario its only 0.01ms difference. I don't think there is any added benefit as the code is only run once at the start of a session and thus not performance critical. And you trade off code readability.

Also the regexreplace function should be defined in the utils.go Its messy having it the Session class

suedadam commented 7 years ago

@Tzeentchful Yeah as I said I ran it from StackImpact and it did that, my results were different than it when I just ran your benchmark as well. That being said, I agree that its not performance critical, don't agree on code readability (as its pretty simple).

That being said, do you have any suggestions on benchmarking the handshake process of the code by any chance? I have a fork over at https://github.com/Psychz/GoLilyPad where I implemented worker pools for handling the traffic and I'm not sure how to benchmark it in comparison to the way you guys have it.

RoamingAnt commented 7 years ago

@suedadam I agreed that it's simple code. But for anyone reading through the code, it's a bit less obvious that is doing string validation versus using regex. Just in my opinion.

And benchmarking that change shouldn't be too hard if you're looking only at the overhead it would add to the code. Just need to make a fake connection(or bypass the networking stack entirely) and write some hard coded replies for the auth process. However, workers might only become beneficial when the server is under heavy load. That would probably involve creating a set of thin clients to simulate a realistic data load.

suedadam commented 7 years ago

I'll start working on that. Oh and also in your benchmark test, you run regexreplace twice. https://gist.github.com/Tzeentchful/78ade6f92ade5d7cf3c916b43cca809c#file-main_test-go-L26-L33

romanalexander commented 7 years ago

The larger issue here is this Regex string doesn't allow for offline-mode clients with non-standard names to join. Proper solution is to use url escaping.

suedadam commented 7 years ago

@romanalexander I wasn't aware you can login with non-standard names for Offline-mode servers, doesn't the regex that Lilypad uses already not allow it? This was really only to make an equivalent to the one that is already being used.

ammario commented 7 years ago

Regex is fast when used properly.

The regex's speed could be improved to around 1000ns/match by compiling outside the hot path.

For example

var nameRegex = regexp.MustCompile("^[a-zA-Z0-9_]+$")

nameRegex.Match(this.name)

If there's still interest in more performance, the validation function should be named appropriately, use a for range, and use rune literals like 'a' instead of their int counterparts.

coelho commented 4 years ago

Late reply, I know,

This was not merged in favor of code clarity. The code being easy to understand is important & regex achieves that.

coelho commented 4 years ago

Implemented the solution suggested by @ammario https://github.com/LilyPad/GoLilyPad/commit/2cbbdcbea8ac72ea5934df0ca9ffa57aa26b40b0

ammario commented 4 years ago

This is insane