aarch64 codegen emits add x28,xzr,#1 not accepted by gnu assembler

Quuxplusone commented 7 years ago


Bugzilla Link	PR34674
Status	RESOLVED FIXED
Importance	P normal
Reported by	Khem Raj (raj.khem@gmail.com)
Reported on	2017-09-19 13:37:10 -0700
Last modified on	2017-09-25 06:53:24 -0700
Version	5.0
Hardware	PC All
CC	diana.picus@linaro.org, ditaliano@apple.com, efriedma@quicinc.com, gberry@codeaurora.org, kristof.beyls@arm.com, llvm-bugs@lists.llvm.org, matze@braunis.de, rengolin@gmail.com, t.p.northover@gmail.com
Fixed by commit(s)
Attachments	`simpler.ll` (8788 bytes, application/octet-stream)
Blocks	PR34695
Blocked by
See also

I have a testcase https://uclibc.org/~kraj/a.cpp

Which fails to compile/assemble when using -no-integrated-as on aarch64, but it
works
ok when using -integrated-as

clang -target aarch64-linux-gnu a.cpp  -O2 -c -std=gnu++11
works

but
clang -target aarch64-linux-gnu a.cpp  -O2 -c -std=gnu++11 -no-integrated-as
throws

/tmp/tmp.ZfaD5q6lwY/a-8fa1ba.s:25688: Error: integer register expected in the
extended/shifted operand register at operand 3 -- `add x28,xzr,#1'
/tmp/tmp.ZfaD5q6lwY/a-8fa1ba.s:25717: Error: integer register expected in the
extended/shifted operand register at operand 3 -- `add x1,xzr,#1'
/tmp/tmp.ZfaD5q6lwY/a-8fa1ba.s:25718: Error: integer register expected in the
extended/shifted operand register at operand 3 -- `add x2,xzr,#1'

I then created .s file output with and without integrated-as and in both cases
I do see above instructions in assmebly file.

I thought it might be a problem only with GNU as so I created a small test case
like this

int foo() {
asm("add x28,xzr,#1") ;
return 0;
}

and now when I compile it, I get errors with both GNU and internal asm

So it does seem that instruction is not accepted by both assemblers but when
using internal assembler and not generating .s files these instructions might
be getting optimized out ?

I think some aarch64 expertise help is needed.

Quuxplusone commented 7 years ago

Hi Raj, this is an encoding problem.

Both ZR and SP registers are encoded as 31, and depending on the instructions,
they're either zero or stack.

ADDri can operate on the stack pointer, which means XZR is an invalid operand.

ADDrr can operate on the zero register, which means #1 is an invalid operand.

Clang "works" when outputting the object file directly probably because the
encoding 31 is also valid for SP, so it emits that. But from your large file, I
can't tell which one would be the correct encoding. It's also possible that
none are (sp+#1 or zr+x?) and this got there from a whole different pattern.

It'd be good to have a reduced case, and possibly a bisection.

cheers,
--renato

Quuxplusone commented 7 years ago

I'm pretty sure xzr is intended. The sequence

        add     x1, xzr, #1             // =1
        add     x2, xzr, #1             // =1
        bl      _ZN2kj1_17HeapArrayDisposer12allocateImplEmmmPFvPvES4_

is marshalling "elementCount" and "capacity" into x1 and x2. There's no reason
for those to be based on sp.

It's difficult to get more detailed info on such a large file though. I'll see
if I can at least llvm-extract the offending function.

Quuxplusone commented 7 years ago

Attached simpler.ll (8788 bytes, application/octet-stream): Isolated function demonstrating issue

Quuxplusone commented 7 years ago

replaceZeroVectorStore is doing something weird with XZR; instead of using a CopyFromReg to read the value of the register, it's using the register itself as an operand to an ISD::STORE. This somehow works in some cases, but it's not really correct, and it eventually bites us here.

Not sure if there's any good way to detect this sort of situation... maybe registers should have a different MVT?

See https://reviews.llvm.org/rL286875 .

Quuxplusone commented 7 years ago

Taking a look now...

Quuxplusone commented 7 years ago

As Eli alluded to, this bug is triggered if store->load forwarding happens
after the store vector 0 split optimization and the loaded value feeds an add:

target triple = "aarch64--linux-gnu"
define i64 @foo(<2 x i64>* %p) {
entry:
  store <2 x i64> zeroinitializer, <2 x i64>* %p
  %p2 = bitcast <2 x i64>* %p to i64*
  %ld = load i64, i64* %p2
  %add = add i64 %ld, 1
  ret i64 %add
}

The store vector 0 split optimization replaces the stored vreg with XZR, then
the store->load forwarding replaces the load with XZR, which if the load is
used by an add ends up generating the invalid instruction:

    add x0, xzr, #1

The store vector 0 split optimization inserts XZR operands to prevent
DAGCombiner::MergeConsecutiveStores from undoing the splitting of vector
stores, but in light of this bug that doesn't seem like a good idea.  I'm
looking into whether this can be achieved through other means or if not if this
vector store splitting should just be moved to after ISel.

Quuxplusone commented 7 years ago

I think you get the behavior you want if you just replace the call to getRegister() with a call to getCopyFromReg()?

It would also be easy to clean up after isel by pattern-matching the store instruction, if you want to go that route.

Quuxplusone commented 7 years ago

Yeah, inserting a CopyFromReg is a good idea.  Patch up for review here:
https://reviews.llvm.org/D38146

Quuxplusone commented 7 years ago

Should be resolved by r313916

Quuxplusone commented 7 years ago

thanks, I would like to request to backport this to release_50

Quuxplusone commented 7 years ago

Adding as a blocker to 5.0.1 meta.

Quuxplusone commented 7 years ago

Removing the release directly, created a merge bug, copied hans.

Quuxplusone commented 7 years ago

_Bug 34675 has been marked as a duplicate of this bug._

Quuxplusone / LLVMBugzillaTest

aarch64 codegen emits add x28,xzr,#1 not accepted by gnu assembler #33646