strings/bytes: LastIndexByte is significantly slower than IndexByte

golang / go

The Go programming language

https://go.dev

BSD 3-Clause "New" or "Revised" License

123.43k stars 17.59k forks source link

strings/bytes: LastIndexByte is significantly slower than IndexByte #36891

Open kirillx opened 4 years ago

kirillx commented 4 years ago

What version of Go are you using (`go version`)?

$ go version
go version go1.13.1 darwin/amd64

Does this issue reproduce with the latest release?

yes.

What operating system and processor architecture are you using (`go env`)?

Both Linux/MacOS

What did you do?

I was using multipart.NewReader() to process multi-part responses from Cloud REST API. It turned out that ~1/3 of profile is spent in mime/multipart/multipart.go :: scanUntilBoundary() -> bytes.LastIndexByte().

After looking into it, it is no wonder as bytes.LastIndexByte() is not using any optimisations and compiled into simple loop iterating over bytes, no REP SCASB instruction is used on Intel (nor SSE).

What did you expect to see?

bytes.LastIndexByte() to use SSE or at least REP SCASB optimised code.

What did you see instead?

simple byte to byte loop in asm code.

ALTree commented 4 years ago

Thanks for reporting this.

IndexByte has been optimized significantly more than LastIndexByte, and it appears to be ~7x faster in comparable benchmarks (i.e. IndexByte(aaaa...ax) vs IndexLastByte(xa...aaaa), when looking for x).

I guess we could port the same implementation to LastIndex.

martisch commented 4 years ago

I was just about to create a new Issue for this but notice this one exists already. I came across this as profiling data shows LastIndexByte is used alot nowdays (e.g. in proto code) to account for a good chunk of overall CPU time profiled.

I agree we should optimize strings/bytes.LastIndexByte similar to IndexByte: https://github.com/golang/go/blob/master/src/internal/bytealg/index_amd64.s

networkimprov commented 4 years ago

Maybe status = NeedsFix?

martisch commented 4 years ago

Assigned myself earlier because I already have prototype that passes ./all.bash but needs benchmarking on a quiet machine and double checking of page boundary handling before sending for review.

gopherbot commented 3 years ago

Change https://golang.org/cl/266538 mentions this issue: strings, bytes: use SIMD for LastIndexByte on amd64

quasilyte commented 2 years ago

@martisch are you planning on merging that CL at some point? Do you need any help with testing/review/benchmarking?

martisch commented 2 years ago

I can plan to merge it next cycle.

The last thing I was missing is a test that the page boundary at the beginning of the data is honoured. If someone could ammend the existing test (or helpers) to create a test string/byteslice where before (and after) the data the page is protected that would help. Last time I checked it only tested one direction but the the tests have changed recently and I did not check again.

Having beginning and end with protected pages tested to make sure operations using SIMD do not read to much data would also help existing code and can be an independent CL.

gopherbot commented 1 year ago

Change https://go.dev/cl/522475 mentions this issue: internal/bytealg: add generic LastIndexByte{,String}