Open Quuxplusone opened 6 years ago
Bugzilla Link | PR35982 |
Status | NEW |
Importance | P enhancement |
Reported by | Ilia Taraban (ilia.taraban@intel.com) |
Reported on | 2018-01-17 01:36:37 -0800 |
Last modified on | 2020-08-30 15:19:29 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | Andrei.l.grischenko@intel.com, craig.topper@gmail.com, efriedma@quicinc.com, gonzalo.gadeschi@gmail.com, jyknight@google.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, richard-llvm@metafoo.co.uk, tstellar@redhat.com |
Fixed by commit(s) | |
Attachments | |
Blocks | PR42319 |
Blocked by | |
See also | PR36354 |
Oops, I mixed up nice-194.s and nice-195.s.
So it should be:
=============== nice-194.s ============
movq 8(%esp), %mm0
punpckhdq %mm0, %mm0 # mm0 = mm0[1,1]
movd %mm0, %ecx
emms
cvtsi2ssl %eax, %xmm0
cvtsi2ssl %ecx, %xmm1
=======================================
=============== nice-195.s ============
cvtsi2ssl %eax, %xmm0
movq 8(%esp), %mm0
emms
punpckhdq %mm0, %mm0 # mm0 = mm0[1,1]
movd %mm0, %ecx
cvtsi2ssl %ecx, %xmm1
=======================================
Simon, do we just need to mark EMMS/FEMMS as defing the MMX registers? That seems to at least fix this case. Maybe mark it as using them too?
(In reply to Craig Topper from comment #2)
> Simon, do we just need to mark EMMS/FEMMS as defing the MMX registers? That
> seems to at least fix this case. Maybe mark it as using them too?
Yes, and any x87 registers as well. I was considering just making them
terminators to prevent any crossover - even though that'd affect all
instructions not just x87/mmx.
(In reply to Craig Topper from comment #2)
> Simon, do we just need to mark EMMS/FEMMS as defing the MMX registers? That
> seems to at least fix this case. Maybe mark it as using them too?
I don't think either of those is sufficient, given the definitions would be
dead (so you're effectively just clobbering the registers).
The x86-64 ABI says "The CPU shall be in x87 mode upon entry to a function."
i.e. the x87 tag word should be set to all ones. So the right way to handle
this is to explicitly model the tag word as a register: MMX instrutions clobber
it, emms defines it, and call/return instructions use it.
There are also other potential "scheduling" problems: currently, MMX intrinsics
are marked IntrNoMem, so an IR transform could sink an MMX instruction past an
EMMS. But that's sort of orthogonal to the MachineInstr modeling.
Is this a duplicate of https://bugs.llvm.org/show_bug.cgi?id=15388 ? It does looks suspiciously similar.
I believe this bug causes undefined behavior in Rust code using MMX registers. We have had a lot of recurring bugs in linux, windows, and macos targets (mostly on 32-bit targets) over the last two years where suddenly some floating-point tests would intermittently fail in some systems and a couple of compiler builds later the failures would disappear just to reappear some time later.
We have started to manually emit emms
using inline assembly and that has fixed some of these intermittent issues for good, but this is probably something worth fixing at the LLVM level.
Frontends using the floating-point LLVM intrinsics should have to use inline assembly to avoid undefined behavior.
A partial fix to clobber MM0-7 and ST0-7 on emms/femms was commited in r352642. This prevents the postRA scheduler from affecting the test case here.
This is not a complete fix and it can break in other ways.
Richard Smith came up with this test (i've slightly modified it for clarity).
Compile for 64bit, -O2 -- https://godbolt.org/z/drn6zs.
=====
#include <mmintrin.h>
void g(__m64);
long long f() {
volatile long long a = {0}, b = {0};
// x87 usage
volatile long double d = 1.0;
long long aload = a, bload = b;
double dload = d;
bool s = __builtin_signbitl(dload);
// MMX usage
__m64 c = _m_punpckhbw (_mm_cvtsi64_m64(aload), _mm_cvtsi64_m64(bload));
if (s) g(c);
long long result = _mm_cvtm64_si64(c);
_mm_empty();
// MMX cleared
return result;
}
=====
_Z1fv: # @_Z1fv
subq $72, %rsp
movq $0, 24(%rsp)
movq $0, 16(%rsp)
;; X87 instructions
fld1
fstpt 48(%rsp)
;; MMX instructions
movq 24(%rsp), %mm0
punpckhbw 16(%rsp), %mm0 # mm0 = mm0[4],mem[4],mm0[5],mem[5],mm0[6],mem[6],mm0[7],mem[7]
;; BROKEN: X87 instructions again...
fldt 48(%rsp)
fstpt 32(%rsp)
movswq 40(%rsp), %rax
testq %rax, %rax
jns .LBB0_2
;; MMX instructions again...
movq2dq %mm0, %xmm0
movq %mm0, 8(%rsp) # 8-byte Spill
callq _Z1gDv1_x
movq 8(%rsp), %mm0 # 8-byte Reload
.LBB0_2:
movq %mm0, %rax
;; Clear MMX state, safe to use x87 again.
emms
addq $72, %rsp
retq
James, this most recent issue isn't unique to llvm is it? I think gcc had the same issue before they switched to converting all mmx operations to SSE2 in recent versions of gcc.