Closed Quuxplusone closed 3 years ago
Bugzilla Link | PR48024 |
Status | RESOLVED FIXED |
Importance | P normal |
Reported by | Tom Hender (ToHe_EMA@gmx.de) |
Reported on | 2020-10-30 12:14:00 -0700 |
Last modified on | 2020-11-20 22:15:05 -0800 |
Version | trunk |
Hardware | All All |
CC | andrea.dibiagio@gmail.com, htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matthew.davis@sony.com, tstellar@redhat.com |
Fixed by commit(s) | rG0e20666db3ac280affe82d31b6c144923704e9c4, rG973b95e0a84 |
Attachments | |
Blocks | PR47800 |
Blocked by | |
See also | PR48033 |
Hi Tom,
Apologies for this issue.
I have a fix for it which I plan to commit it tomorrow.
This issue is likely to be a regression introduced by my rewrite of the LSUnit
in commit 5578ec32f9c4fef46adce52a2e3d22bf409b3d2c.
=====
As a side not (unrelated to this bug):
The code generated by LLVM for your code snippet is sub-optimal. It is as if
the compiler tried very hard to keep alive the stack slot containing the old
value of MXCSR (i.e. slot at location `rsp - 4`) until the end of the function.
This is suboptimal because it means that an extra stack slot (rsp - 8) has to
be used for the new value of MXCSR. If instead the original slot was reused,
the compiler could have simply emitted a MR variant of AND (read-modify-write),
and this would have avoided the use of an extra MOV.
GCC gets this right: the entire sequence is three instructions plus the RET.
I wonder if this poor codegen has to do with the fact that STMXCSR is defined
as having "unmodeled side-effects". It might be that somehow that prevents the
compiler from commuting the original ADD and use a RMW variant instead.
Alternatively StackSlotColoring is not doing a good job at merging the two
stack slots. This is just me speculating on what the issue might be in the code
generator.
On the plus side, the compiler is smart at taking advantage of the red-zone in
this case. Part of me wasn't expecting to see negative offsets used with RSP.
In this particular case, it makes perfectly sense and it avoids having to emit
an extra SUB (of RSP) at the beginning, plus an extra ADD at the end.
@andreadb Please can you raise [Comment #1] as a separate x86 bug?
Fixed on master by commit 0e20666db3ac280affe82d31b6c144923704e9c4
@Tom, could you please verify that the fix works for you?
Thanks.
@Simon:I am going to raise a separate bug for that poor-codegen.
Raised bug 48033 for the poor codegen issue.
(In reply to Andrea Di Biagio from comment #3)
> Fixed on master by commit 0e20666db3ac280affe82d31b6c144923704e9c4
>
> @Tom, could you please verify that the fix works for you?
This should be merged into 11.x - assuming Tom confirms the fix.
Thank you.
The crash is now fixed for me as well.
The inefficient code generation from Clang luckily doesn't affect my real code because it both must remember the old MXCSR and because it's written with inline assembly anyway. I don't trust compilers around rounding mode changes anymore.
Hi Andrea,
What is your opinion on backporting this?
https://reviews.llvm.org/rG0e20666db3ac280affe82d31b6c144923704e9c4
(In reply to Tom Stellard from comment #7)
> Hi Andrea,
>
> What is your opinion on backporting this?
>
> https://reviews.llvm.org/rG0e20666db3ac280affe82d31b6c144923704e9c4
Hi Tom,
My opinion is that it should be merged into the release branch as it prevents a
crash when the input assembly contains store instructions that also have
unmodeled side effects.
Merged: 973b95e0a84