cmd/compile: possible latent codegen issue on amd64 removing zero extensions

josharian commented 4 years ago

I'm not sure whether it is possible to trigger this bug right now, but I think there may be an issue lurking.

Consider code like:

func f(x uint32) uint64 {
    return uint64(x & 0xFFFFFFFF)
}

The outmost op is ZeroExt32to64, which gets lowered to MOVLQZX. The innermost op ends up being (ANDLconst [0xFFFFFFFF] x).

Then this rule triggers: (MOVLQZX x) && zeroUpper32Bits(x,3) -> x. ANDLconst is among the ops listed as zeroing the upper 32 bits of a register, so we eliminate the zero extension.

Then this rule triggers: (ANDLconst [c] x) && int32(c)==-1 -> x, eliminating the ANDLconst.

This leaves us with just x, but without any zero extension, which could leave junk in the top half of the register.

As it stands, this doesn't happen in this function, because the And32 is eliminated during generic optimization. But there are ways to sneak an & -1 past generic optimization, e.g. with a constant shift, which is how I discovered this issue. We are also saved in this case by using a MOVL to load the value from an arg. But we could be using a computed value instead. So I haven't convinced myself that this couldn't actually cause bad code generation right now for just the right input.

There are two possible diagnoses/fixes:

zeroUpper32Bits assumes that the inner op actually gets executed, which is wrong, in which case we need to pare back the list of accepted ops, probably to only loads, which I think cannot be eliminated at that point (?), since dse occurs earlier
the ANDLconst rule is unsound because eliminating it eliminates the side-effects of zeroing the top half of the register

I'm strongly inclined to the first diagnosis and fix. Thoughts?

cc @randall77 @cherrymui

randall77 commented 4 years ago

I've never been super happy about our whole sign extension story. In machine-independent space I think we're good, but we play very fast and loose in the machine-dependent space. Maybe there's some invariant we can use where we guarantee to only use the bits of a value that exist in its type. So there would be two effective ANDLconsts, one with a 32-bit type and one with a 64-bit type. ANDLconst [-1] could be removed in the former but not in the latter.

josharian commented 4 years ago

Agreed. And at the same time, we also fail to remove a bunch of extraneous extensions

That approach sounds good to me, assuming the details work out ok. It could cause an explosion in number of ops and rules (which is already a problem), and might be tricky with shifts.

I don't think I want to sign up for tackling this one. :)

cherrymui commented 4 years ago

Same for me. I'm also not super happy about the extension removing.

(ANDLconst [c] x) && int32(c)==-1 -> x

I think we should only do this if the type is 32-bit wide or smaller. But how good are we keeping the types accurate?

(MOVLQZX x) && zeroUpper32Bits(x,3) -> x

The LHS is 64-bit wide, whereas the RHS is 32-bit wide. This kind of rules have always troubled me. Maybe we could have a way of doing this in a type-preserving way? Something like (op x:(op2 ...)) -> (op2 <t> ...) ? (this basically makes copy of x (not OpCopy, but copy the expression of x) in the new type.)

gopherbot commented 4 years ago

Change https://golang.org/cl/220499 mentions this issue: cmd/compile: mark Lsyms as readonly earlier

gopherbot commented 4 years ago

Change https://golang.org/cl/226957 mentions this issue: internal/lsp/regtest: dump gopls logs on test failure

ianlancetaylor commented 4 years ago

I guess we didn't do anything here for 1.15. Moving to 1.16 milestone.

aclements commented 3 years ago

Since there's no current plan to work on this, bumping to "Unplanned"

golang / go

cmd/compile: possible latent codegen issue on amd64 removing zero extensions #36897