golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.23k stars 17.57k forks source link

cmd/compile: arithmetic optimization for shifts #69635

Open StefanEnspired opened 1 week ago

StefanEnspired commented 1 week ago

Proposal Details

I have a simple expression optimization suggestion that I noticed in some of my code. The minimal snippet that produces the offending code is this:

func calc(a uint64) uint64 {
    v := a >> 20 & 0x7f
    return v << 3
}

In my actual code, the calculation on the first line is actually happening in another (inlined) function; this terser version leads to the same result:

SHRQ $20, AX
ANDL $127, AX
SHLQ $3, AX

Obviously, shift left and shift rights do not cancel each other out in the same way as additions and subtractions do, but they are almost close enough. Addition and subtraction will in fact be combined, also in the inlined-function case:

func calc(a uint64) uint64 {
    v := a - 6
    return v + 2
}

ADDQ $-4, AX

For combining shifts in the same way, an additional masking AND would be required, but if it is already there, it can be recycled to efficiently combine the shifts. Actually the operations do not even have to oppose each other. Combining two adds or two shifts in the same direction would work just as well.

For x << i & m >> j with constants i, j and m, the combined shifting distance d would be i-j (or i+j for << <<, or -i-j for >> >>, .. you get the idea). We would then end up with the optimized version:

x DIR-SHIFT d & (m SH j)

where DIR-SHIFT is the idealized shift operation in either direction depending on the polarity of d, and SH is whatever the original second shift direction was.

The final result for the original function would thus be:

SHRQ $17, AX
ANDL $1016, AX
ianlancetaylor commented 6 days ago

There are no API changes here, so taking this out of the proposal process.

randall77 commented 6 days ago

Sounds reasonable.

Note that this depends on architecture. On arm64 it is already just 2 instructions.

UBFX    $20, R0, $7, R1
LSL $3, R1, R0
StefanEnspired commented 6 days ago

Note that this depends on architecture. On arm64 it is already just 2 instructions.

Only for "nice" mask constants, though.