SystemZ target - Incorrect code generated for accessing thread_local variables when high-word feature is available

Quuxplusone commented 4 years ago


Bugzilla Link	PR44254
Status	RESOLVED FIXED
Importance	P normal
Reported by	Slavomír Kučera (slavomir.kucera@broadcom.com)
Reported on	2019-12-09 01:09:44 -0800
Last modified on	2020-03-03 07:51:38 -0800
Version	9.0
Hardware	Other other
CC	htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org, paulsson@linux.vnet.ibm.com, uweigand@de.ibm.com
Fixed by commit(s)
Attachments	`thrd_ptr.patch` (1157 bytes, text/plain) `test.ll` (2619 bytes, text/plain)
Blocks
Blocked by
See also

I came across a bug in the generated code for SystemZ target when accessing
thread_local variables in functions that have 7 or more parameters.

The problem can be seen here
https://godbolt.org/z/v77J_p

while extracting the values from AR0 and AR1

        ear     %r0, %a0                 ; copy AR0 to GR0L
        risblg  %r14, %r0, 0, 159, 32    <<<<<<<< copy GR0H to GR14L
        sllg    %r14, %r14, 32
        ear     %r0, %a1
        risblg  %r14, %r0, 0, 159, 32    <<<<<<<< the same issue here

I think I managed to track down the problem to the following pieces of code:

in SystemZTargetLowering::lowerThreadPointer:

  SDValue TPHi = DAG.getCopyFromReg(Chain, DL, SystemZ::A0, MVT::i32);

combined with the 'high-word' feature dependent part in
SystemZTargetLowering::SystemZTargetLowering:

  if (Subtarget.hasHighWord())
    addRegisterClass(MVT::i32, &SystemZ::GRX32BitRegClass);
  else
    addRegisterClass(MVT::i32, &SystemZ::GR32BitRegClass);

the compiler thinks there is an option to copy the content of access registers
into high half of the general purpose register, which is not available on the
platform.

Quuxplusone commented 4 years ago

Attached thrd_ptr.patch (1157 bytes, text/plain): Possible solution

Quuxplusone commented 4 years ago

Attached test.ll (2619 bytes, text/plain): source llvm ir

Quuxplusone commented 4 years ago

Thanks for the bug report! I can reproduce the problem locally, however the symptom is somewhat different. I do not see any wrong code being generated, instead I see the assertion failure "Impossible reg-to-reg copy" from copyToPhysReg.

I'm wondering how the godbolt.org version of LLVM is built/configured to get the wrong code output -- that seems odd.

In any case, the root cause either way is that we have no instruction to copy between the AR32 register class and the GRH32 register class.

The fix you proposed does handle this one specific case, but I guess it really isn't a full solution -- we could get other accesses to access registers, e.g. via inline asm. Those might still run into the bug.

I'm not completely sure what the full fix should look like. Jonas, you've been looking into high register problems recently -- can you have a look whether this can be done via regalloc hints or the like?

Quuxplusone commented 4 years ago

I've observed both behaviors on different versions, I can't recall the commit
that introduced that assert (which is certainly preferable to bad code).

My proposal obviously fixes this one specific case only, on the other hand this
one case cannot be really addressed by the user.
Regarding the inline asm, I was not aware that the 32-bit halfs could be
requested...

Quuxplusone commented 4 years ago

(In reply to Slavomír Kučera from comment #4)
> I've observed both behaviors on different versions, I can't recall the
> commit that introduced that assert (which is certainly preferable to bad
> code).

Well, the assert itself has been in forever, so whatever caused the change in
behavior must be somewhere else.  In any case, that's probably not the
important issue anyway.

> My proposal obviously fixes this one specific case only, on the other hand
> this one case cannot be really addressed by the user.
> Regarding the inline asm, I was not aware that the 32-bit halfs could be
> requested...

No, but you can use inline asm to request an *access register*.  If that causes
the value in an access register to be moved to/from an general "int" variable,
and the register allocator just happens to chose a GPR high half for that
variable, then you run into the same problem.

Quuxplusone commented 4 years ago

I tried using a new Pseudo instruction to be selected in this case of a
CopyToReg, to then be lowered via custom insertion to a SAR while constraining
the virtreg to GR32. This however did not work as the CopyToReg is not an
ordinary ISD node, but depends on special handling. Trying to replace it
quickly broke things in the InstrEmitter, so I don't think this is a good idea.

Since CopyToReg and CopyFromReg are special nodes it seems reasonable to me to
add a check in InstrEmitter::EmitSpecialNode() to allow the target to constrain
the registerclass when emitting a COPY in presence of a physreg. This is then
reflecting the targets ability to emit the actual COPY instruction itself.

As a simpler alternative, I think the SystemZ backend could have a fall-back in
SystemZPostRewrite.cpp (which is run also at -O0) that would rotate the GR64
register twice in case of a COPY to/from the high-part. That would handle *any*
such cases, including inlineasm. I wonder if this is acceptable
performancewise? In combination with Slavomirs patch this would then only
happen with inlineasms (and possibly other things I am not aware of).

Quuxplusone commented 4 years ago

(In reply to Jonas Paulsson from comment #6)
> Since CopyToReg and CopyFromReg are special nodes it seems reasonable to me
> to add a check in InstrEmitter::EmitSpecialNode() to allow the target to
> constrain the registerclass when emitting a COPY in presence of a physreg.
> This is then reflecting the targets ability to emit the actual COPY
> instruction itself.

This might make sense.  I'm not quite sure what (if any?) restrictions ought to
hold for COPY instructions in general.  Is the target really required to be
able to implement *any* move between register classes?  That seems odd.
Probably something to be clarified on the list.

> As a simpler alternative, I think the SystemZ backend could have a fall-back
> in SystemZPostRewrite.cpp (which is run also at -O0) that would rotate the
> GR64 register twice in case of a COPY to/from the high-part. That would
> handle *any* such cases, including inlineasm. I wonder if this is acceptable
> performancewise? In combination with Slavomirs patch this would then only
> happen with inlineasms (and possibly other things I am not aware of).

That would certainly be the fallback if we cannot find any better solution.

Quuxplusone commented 4 years ago

Patch proposed at https://reviews.llvm.org/D75014

Quuxplusone commented 4 years ago

Fixed by ae4d39c9

Quuxplusone / LLVMBugzillaTest

SystemZ target - Incorrect code generated for accessing thread_local variables when high-word feature is available #43224