Closed Quuxplusone closed 6 years ago
FYI, additional ABI issue possibility:
On FreeBSD 11.0-CURRENT (well, projects/clang380-import) when I try "clang++ -
std=c++11 -dM -E just_main.cpp" I get. . .
#define __BIGGEST_ALIGNMENT__ 8
. . .
#define __NATURAL_ALIGNMENT__ 1
on both TARGET_ARCH=powerpc and TARGET_ARCH=powerpc64 contexts.
But for the special port devel/powerpc64-gcc used for modern cross compiles of
FreeBSD ( /usr/local/bin/powerpc64-portbld-freebsd11.0-g++ -std=c++11 -dM -E
just_main.cpp ) I get:
#define __BIGGEST_ALIGNMENT__ 16
(Natural is not referenced. There is no such special port for 32-bit powerpc to
see its output.)
So there may be an ABI alignment mismatch someplace as well unless the ABI has
some optional-status alignment rules.
Without a ABI reference document I'm unsure which __BIGGEST_ALIGNMENT__ would
be correct for the FreeBSD powerpc ABI (if either one is). (Similarly for
powerpc64.)
Side notes:
powerpc64-portbld-freebsd11.0-g++ also reports:
#define __CMODEL_MEDIUM__ 1
. . .
#define __cpp_rvalue_reference 200610
But clang++ 3.8.0 reports nothing analogous to the first and:
#define __cpp_rvalue_references 200610
(gcc and clang have spelling differences here.)
"FreeBSD's" Justin Hibbits added a note in the FreeBSD bug entry (206990) that
I repeat here:
There is no provision in the ABI for a redzone in 32-bit powerpc. LLVM is
broken for 32-bit PowerPC regarding this, and there are comments in the source
code to this regard, to the effect:
(PPCFrameLowering.cpp):
// FIXME: On PPC32 SVR4, we must not spill before claiming the stackframe.
If a signal interrupts the thread at the precise wrong time (when creating the
stack frame, but before adjusting %r1), Bad Things will happen.
It appears that I originally forgot to set the Component for my submittal. Looking around I estimate that LLVM Codegen is the closest category: PPCFrameLowering.cpp is not under tools/clang/ at all so C++ would seem to be over specific.
(In reply to comment #1)
Just a note on __cpp_rvalue_reference vs. __cpp_rvalue_references :
> Side notes:
>
> powerpc64-portbld-freebsd11.0-g++ also reports:
>
> #define __CMODEL_MEDIUM__ 1
> . . .
> #define __cpp_rvalue_reference 200610
>
> But clang++ 3.8.0 reports nothing analogous to the first and:
>
> #define __cpp_rvalue_references 200610
>
> (gcc and clang have spelling differences here.)
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00175.html says:
> as PR71214 points out gcc uses a wrong feature test macro for C++11
> rvalue references: __cpp_rvalue_reference instead of the correct
> __cpp_rvalue_references.
So this specific point is a gcc problem but libraries targeting supporting what
gcc has done historically will now have 2 names to test the value of.
Patch for review: https://reviews.llvm.org/D24093
Committed in r280705.
(In reply to comment #6)
> Committed in r280705.
Thanks Krzysztof.
Dimitry Andric (dim at FreeBSD.org) has written:
> I merged the upstream fix to projects/clang390-import:
>
> https://svnweb.freebsd.org/changeset/base/305686
So FreeBSD head (current) for 12 will be adopting your changes.
As for my activity:
I'll not have access to powerpc64s/powerpcs for a few weeks yet.
(In reply to comment #6)
> Committed in r280705.
Did the commit also fix the stack pointer adjustment timing for the post-amble
code?
The wording that I see in the review and commit talks about the "claim" side of
things, which I interpret to be for the pre-amble side of things. If I
interpret the code right that is the side fixed. (I'm not clang/llvm code
literate so I could easily be wrong.)
My original submittal also noted the timing problem existed on the post-amble
side in 3.8.0's code generation:
0x1801b8c <Buf_AddBytes+104>: lwz r30,24(r31)
0x1801b90 <Buf_AddBytes+108>: lwz r29,20(r31)
0x1801b94 <Buf_AddBytes+112>: lwz r28,16(r31)
0x1801b98 <Buf_AddBytes+116>: lwz r27,12(r31)
0x1801b9c <Buf_AddBytes+120>: lwz r26,8(r31)
0x1801ba0 <Buf_AddBytes+124>: addi r1,r1,32 # Stack pointer adjusted first
0x1801ba4 <Buf_AddBytes+128>: lwz r0,4(r1)
0x1801ba8 <Buf_AddBytes+132>: lwz r31,-4(r1) # Then Frame Pointer load
happens
# "outside" the new stack range.
0x1801bac <Buf_AddBytes+136>: mtlr r0
0x1801bb0 <Buf_AddBytes+140>: blr
If such code can still be generated there is is a time frame needing a red-zone
to protect stack contents.
Hopefully I'm just wrong and this was fixed too.
The post-amble has not been fixed.
The epilogue part of the fix: https://reviews.llvm.org/D24466
Hopefully there is nothing else missing.
Committed in r282174.
(In reply to comment #11)
> Committed in r282174.
projects/clang390-import has been merged to head for FreeBSD --and is
now scheduled for an MFC from there into stable/11 in a month. It does
not yet have this fix.
But there is a chance for later 11.x-RELEASE's to support powerpc64 and/or
powerpc via clang, not just 12.0-RELEASE. (I'm not implying in a month
specifically, just that stable/11 might progress as llvm does as long
as the fixes backport into FreeBSD's 3.9.0 reasonably --and that 11.x
would pick up such changes from stable/11 .)
I'm still hoping that llvm 26519's and 26970's fixes will be applied to
clang 3.9.0 in FreeBSD's head soon. That would enable some significant
testing. I could try buildworld for powerpc64 and I could remove my "red
zone" signal delivery hack for powerpc to see if it is no longer needed
[lack of ABI violations].
[I want to be able to have good runs/results of the kyua tests before I
try to deal with testing buildkernel. This requires signal handling and
C++ exception handling to be working well even if the kernel does not
require all of that.]
For reference this is now in FreeBSD as https://svnweb.freebsd.org/changeset/base/309147
(In reply to comment #13)
> For reference this is now in FreeBSD as
> https://svnweb.freebsd.org/changeset/base/309147
Unfortunately there are problems handling use of r30, at least when floating
point is involved.
The used code from the compliler_rt __floatdidf seems to be just wrong in part
of its use of r30. The below initially uses the "df -m" example failure and its
code.
At first is saves r30 on the stack as part of the function preamble:
Dump of assembler code for function __floatdidf:
0x018029b8 <__floatdidf+0>: mflr r0
0x018029bc <__floatdidf+4>: stw r0,4(r1)
0x018029c0 <__floatdidf+8>: stwu r1,-32(r1)
0x018029c4 <__floatdidf+12>: stw r31,28(r1)
0x018029c8 <__floatdidf+16>: stw r30,24(r1)
Later it uses r30 for other things.
Then it restores it from the stack:
0x01802a08 <__floatdidf+80>: lwz r30,24(r1)
What it does with r30 next effectively makes r30 a parameter to the routine and
not just saved/restored:
0x01802a10 <__floatdidf+88>: lwz r3,-8(r30)
In the above the r30 from the caller's context is being used as the base for
accessing memory and putting that memory content in r3.
What it does next with r3 is one of the points where SIGSEGV is happening: For
"df -m" r3 ends up being 0 and the following fails.
0x01802a18 <__floatdidf+96>: lfs f12,0(r3)
In the fsck_ufs example it is the same sort of thing but r30 happens to be
initially 0 so it fails at an earlier stage.
Again there is the preamble that saves r30:
0x0181afcc <__floatdidf+0>: mflr r0
0x0181afd0 <__floatdidf+4>: stw r0,4(r1)
0x0181afd4 <__floatdidf+8>: stwu r1,-32(r1)
0x0181afd8 <__floatdidf+12>: stw r31,28(r1)
0x0181afdc <__floatdidf+16>: stw r30,24(r1)
Later it uses r30 for other things.
Then it restores it from the stack:
0x0181b01c <__floatdidf+80>: lwz r30,24(r1)
What it does with r30 next effectively makes r30 a parameter to the routine and
not just saved/restored:
0x0181b024 <__floatdidf+88>: lwz r3,-12(r30)
Again: In the above the r30 from the caller's context is being used as the base
for accessing memory and putting that memory content in r3.
But this time it turns out that r30 is 0 and the above also fails.
If the code had gotten that far it would have done the same thing with r3 that
"df -m" did in its failure:
0x0181b02c <__floatdidf+96>: lfs f12,0(r3)
For reference compiler-rt/lib/builtins/floatdidf.c has:
COMPILER_RT_ABI double
__floatdidf(di_int a)
{
static const double twop52 = 4503599627370496.0; // 0x1.0p52
static const double twop32 = 4294967296.0; // 0x1.0p32
union { int64_t x; double d; } low = { .d = twop52 };
const double high = (int32_t)(a >> 32) * twop32;
low.x |= a & INT64_C(0x00000000ffffffff);
const double result = (high - twop52) + low.d;
return result;
}
and the matching assembler code for the fsck_ufs example is (from gdb):
0x0181afcc <__floatdidf+0>: mflr r0
0x0181afd0 <__floatdidf+4>: stw r0,4(r1)
0x0181afd4 <__floatdidf+8>: stwu r1,-32(r1)
0x0181afd8 <__floatdidf+12>: stw r31,28(r1)
0x0181afdc <__floatdidf+16>: stw r30,24(r1)
0x0181afe0 <__floatdidf+20>: bl 0x182e96c <.got+20>
0x0181afe4 <__floatdidf+24>: mr r31,r1
0x0181afe8 <__floatdidf+28>: xoris r3,r3,32768
0x0181afec <__floatdidf+32>: lis r5,17200
0x0181aff0 <__floatdidf+36>: mflr r30
0x0181aff4 <__floatdidf+40>: stw r3,12(r31)
0x0181aff8 <__floatdidf+44>: stw r5,8(r31)
0x0181affc <__floatdidf+48>: lwz r3,-16(r30)
0x0181b000 <__floatdidf+52>: lwz r6,-20(r30)
0x0181b004 <__floatdidf+56>: lfd f1,8(r31)
0x0181b008 <__floatdidf+60>: stw r4,20(r31)
0x0181b00c <__floatdidf+64>: stw r5,16(r31)
0x0181b010 <__floatdidf+68>: lfd f13,16(r31)
0x0181b014 <__floatdidf+72>: lwz r0,36(r1)
0x0181b018 <__floatdidf+76>: lwz r31,28(r1)
0x0181b01c <__floatdidf+80>: lwz r30,24(r1)
0x0181b020 <__floatdidf+84>: lfs f2,0(r3)
0x0181b024 <__floatdidf+88>: lwz r3,-12(r30)
0x0181b028 <__floatdidf+92>: lfs f0,0(r6)
0x0181b02c <__floatdidf+96>: lfs f12,0(r3)
0x0181b030 <__floatdidf+100>: fsub f0,f1,f0
0x0181b034 <__floatdidf+104>: fmul f0,f0,f2
0x0181b038 <__floatdidf+108>: fadd f0,f0,f12
0x0181b03c <__floatdidf+112>: fadd f1,f13,f0
0x0181b040 <__floatdidf+116>: addi r1,r1,32
0x0181b044 <__floatdidf+120>: mtlr r0
0x0181b048 <__floatdidf+124>: blr
(In reply to comment #14)
My interpretation of this is that the "lwz r30,24(r1)" is simply out
of sequence and should be later in the sequence:
> 0x0181b01c <__floatdidf+80>: lwz r30,24(r1) <<<=== not here
> 0x0181b020 <__floatdidf+84>: lfs f2,0(r3)
> 0x0181b024 <__floatdidf+88>: lwz r3,-12(r30)
> 0x0181b028 <__floatdidf+92>: lfs f0,0(r6)
> 0x0181b02c <__floatdidf+96>: lfs f12,0(r3)
> 0x0181b030 <__floatdidf+100>: fsub f0,f1,f0
> 0x0181b034 <__floatdidf+104>: fmul f0,f0,f2
> 0x0181b038 <__floatdidf+108>: fadd f0,f0,f12
> 0x0181b03c <__floatdidf+112>: fadd f1,f13,f0
lwz r30,24(r1) <<<<<<<<<<=============== instead here
> 0x0181b040 <__floatdidf+116>: addi r1,r1,32
> 0x0181b044 <__floatdidf+120>: mtlr r0
> 0x0181b048 <__floatdidf+124>: blr
(addresses not adjusted for the change in the above)
So that r30 would be restored between the main activity being
finished and r1 being adjusted (popping the stack frame). Of
course any place after the last local r30 use [after -12(r30)]
up to where I show it would also work.
I reported this issue as a comment on 26519 because I'm guessing
that the ordering problem is tied to trying to trying to avoid the
stack-handling ABI violations for FreeBSD: another incompleteness
in the coverage of the stack handling. If it is not that then a
separate submittal would likely be appropriate.
Since I'm allowed to I'm going to change the status to reopened so
that it does not look to have been fully fixed. Let me know if this
is inappropriate since I would not be the one fixing it.
FYI: FreeBSD has merged clang/llvm 4.0 into stable/11 (-r316423)
in addition to head (12) some time ago. release/11.1.0 will be
clang/llvm 4.0 based.
TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc still does not
handle thrown exceptions in this clang version.
TARGET_ARCH=powerpc still has the R30 vs. tack handling vs.
floating point code problem: restored R30 and then generates
floating point cleanup code that expects R30 to not have
changed yet.
(That last is why this was reopened, although at the time
the clang was an earlier version.)
Here's what I tried:
--- float-r30.c ---
double __floatdidf(long long int a) {
static const double twop52 = 4503599627370496.0;
static const double twop32 = 4294967296.0;
union { long long int x; double d; } low = { .d = twop52 };
const double high = (int)(a >> 32) * twop32;
low.x |= a & 0x00000000ffffffffL;
const double result = (high - twop52) + low.d;
return result;
}
-------------------
clang -target powerpc -S -O2 float-r30.c -fpic
Output:
--- float-r30.s ---
.text
.file "float-r30.c"
.section .rodata.cst4,"aM",@progbits,4
.p2align 2
.LCPI0_0:
.long 1501560836 # float 4.50360177E+15
.LCPI0_1:
.long 1333788672 # float 4.2949673E+9
.LCPI0_2:
.long 3649044480 # float -4.50359963E+15
.text
.globl __floatdidf
.p2align 2
.type __floatdidf,@function
__floatdidf: # @__floatdidf
.Lfunc_begin0:
# BB#0: # %entry
mflr 0
stw 0, 4(1)
stwu 1, -32(1)
stw 31, 28(1)
stw 30, 24(1)
bl _GLOBAL_OFFSET_TABLE_@local-4
mr 31, 1
lis 5, 17200
xoris 3, 3, 32768
mflr 30
stw 5, 8(31)
stw 3, 12(31)
lwz 3, .LCPI0_1@GOT(30)
lwz 6, .LCPI0_0@GOT(30)
lfd 1, 8(31)
stw 4, 20(31)
stw 5, 16(31)
lfd 13, 16(31)
lwz 0, 36(1)
lwz 31, 28(1)
lwz 30, 24(1)
lfs 2, 0(3)
lwz 3, .LCPI0_2@GOT(30)
lfs 0, 0(6)
lfs 12, 0(3)
fsub 0, 1, 0
fmul 0, 0, 2
fadd 0, 0, 12
fadd 1, 0, 13
addi 1, 1, 32
mtlr 0
blr
.Lfunc_end0:
.size __floatdidf, .Lfunc_end0-.Lfunc_begin0
.ident "clang version 5.0.0 (http://llvm.org/git/clang.git fcb52251e4b32f8f4ab0a4d434c3f9c5a5ffe008) (http://llvm.org/git/llvm.git 624e695cd90a7b570df26159f225f79fc8a86976)"
.section ".note.GNU-stack","",@progbits
-------------------
The r30 is used as a PIC base pointer, and it is restored before it's used.
This is actually correct.
I take it back. I just noticed that it's set to GOT at the entry to the function, not before it.
Committed a fix in r302183.
Opened PR32930 to request merging this into 4.0.1.
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.
Thanks. FreeBSD's head merged it in as -r317810
and I've built and booted -r317820 based on:
A) buildworld built via the updated system-clang 4
B) buildkernel built via gcc 4.2.1
It appears for (A) that even C++ exceptions work to
some extent (possibly fully) and various previously
broken commands now work (no stack problem).
Thanks much. Big improvement.
stable/11 will probably merge the change in
a few days.
Booting a kernel from a system-clang 4 based
buildkernel fails with "exec /sbin/init:
error 13". This is not new. But it does get
to the point of attempting /sbin/init .
(Note: powerpc64 could use the 2 line
patch listed in llvm bugzilla 31716's Comment
1 to get to a somewhat similar state of
usability on FreeBSD. As stands I use a
personal patch for the issue but that is just
me.)
(In reply to Mark Millard from comment #21)
> (In reply to Krzysztof Parzyszek from comment #19)
> > Committed a fix in r302183.
>
> Thanks. FreeBSD's head merged it in as -r317810
> and I've built and booted -r317820 based on:
>
> A) buildworld built via the updated system-clang 4
> B) buildkernel built via gcc 4.2.1
>
> It appears for (A) that even C++ exceptions work to
> some extent (possibly fully) and various previously
> broken commands now work (no stack problem).
I got testing clang vs. gcc 4.2.1 confused: only the
gcc 4.2.1 buildworld ends up with a c++ compiler where
handling thrown C++ exceptions works. When clang++ is
used handling thrown C++ exceptions fails in the program
that results, like before.
(powerpc64 has the same sort of C++ exception handling
issues.)
But the stack handling fix is still a great improvement
for powerpc: thanks again.
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.
buildworld buildkernel makes extensive use
of signals and its failure is how I discovered
the original stack handling problems. I used to
have to patch in so-called "red zone" handling
to avoid the issue.
No more: a running a kernel that was built
without a "red zone" and running a world based
on clang now allows buildworld buildkernel to
complete just fine.
Not needing a "red zone" is a great
improvement.
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.
Bad news: Another code generation error, this
time demonstrated in compiling part of perl5. . .
(I tried to build a port that indirectly
tried build perl5 but perl5's miniperl crashes.)
/usr/obj/portswork/usr/ports/lang/perl5.24/work/perl-5.24.1/numeric.c
has Perl_cast_iv(NV f) for which clang double stores
two different things to one address [24(r1)]. Below
the => lines are the double store, the second
destroying the r30 value that was saved in the first:
Dump of assembler code for function Perl_cast_iv:
0x0196a114 <+0>: mflr r0
0x0196a118 <+4>: stw r0,4(r1)
0x0196a11c <+8>: stwu r1,-32(r1)
0x0196a120 <+12>: stw r31,28(r1)
=> 0x0196a124 <+16>: stw r30,24(r1)
0x0196a128 <+20>: mr r31,r1
0x0196a12c <+24>: mfcr r12
=> 0x0196a130 <+28>: stw r12,24(r31)
Note: r31 == r1 for that second "=>" line.
The return code sequence has a similar
problem: two loads from the same address.
Note: r31 == r1 here too.
=> 0x0196a1bc <+168>: lwz r12,24(r31)
0x0196a1c0 <+172>: lwz r0,36(r1)
0x0196a1c4 <+176>: lwz r31,28(r1)
=> 0x0196a1c8 <+180>: lwz r30,24(r1)
0x0196a1cc <+184>: mtcrf 32,r12
0x0196a1d0 <+188>: addi r1,r1,32
0x0196a1d4 <+192>: mtlr r0
0x0196a1d8 <+196>: blr
The Perl_cast_iv source code looks like:
IV
Perl_cast_iv(NV f)
{
if (f < IV_MAX_P1)
return f < IV_MIN ? IV_MIN : (IV) f;
if (f < UV_MAX_P1) {
#if CASTFLAGS & 2
/* For future flexibility allowing for sizeof(UV) >= sizeof(IV) */
if (f < UV_MAX_P1_HALF)
return (IV)(UV) f;
f -= UV_MAX_P1_HALF;
return (IV)(((UV) f) | (1 + (UV_MAX >> 1)));
#else
return (IV)(UV) f;
#endif
}
return f > 0 ? (IV)UV_MAX : 0 /* NaN */;
}
Could you attach a preprocessed file?
Attached perl_numeric_powerpc.txt.zip
(65626 bytes, application/zip): .zip of preprocessed numeric.c from perl5 for comments 24 & 25
I posted a patch for master in https://reviews.llvm.org/D33017, but it doesn't apply cleanly to release_40.
I'm attaching the version for 4.0. Could you try if it works?
Attached 0001-PPC-Properly-update-register-save-area-offsets-release-40.patch
(2159 bytes, text/plain): Patch for release_40
(In reply to Krzysztof Parzyszek from comment #28)
> I posted a patch for master in https://reviews.llvm.org/D33017, but it
> doesn't apply cleanly to release_40.
>
> I'm attaching the version for 4.0. Could you try if it works?
I'll see about updating to such a clang and retrying the
port builds with it, for example.
I'll note that Roman Divacky has me trying for clang 4:
# svnlite diff /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
Index: /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
===================================================================
--- /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp (revision
317820)
+++ /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp (working copy)
@@ -1187,7 +1187,7 @@
// For SVR4, don't emit a move for the CR spill slot if we haven't
// spilled CRs.
if (isSVR4ABI && (PPC::CR2 <= Reg && Reg <= PPC::CR4)
- && !MustSaveCR)
+ && (!MustSaveCR && isPPC64))
continue;
// For 64-bit SVR4 when we have spilled CRs, the spill location
for having the CFI information written out for condition code
register use for powerpc (later code not shown). He is looking
into some of the issues for C++ handling thrown exceptions.
(In reply to Krzysztof Parzyszek from comment #28)
> I posted a patch for master in https://reviews.llvm.org/D33017, but it
> doesn't apply cleanly to release_40.
>
> I'm attaching the version for 4.0. Could you try if it works?
I've done a buildworld based on the patch, installed it,
booted it, and: built and installed /usr/ports/lang/perl5.24
with it. That includes the internal miniperl build that
previous created a miniperl that crashed --and the use of
the new miniperl in the later stages of building perl:
no crashes and things seem to be operating.
So it looks good so far for what the patch was intended
to fix (32-bit powerpc context).
(In reply to Mark Millard from comment #31)
>
> So it looks good so far for what the patch was intended
> to fix (32-bit powerpc context).
That is great news. Thank you for doing all this work.
There are a couple of different issues discussed in this PR so I'm not sure if they are all resolved at this point or not. Mark?
(In reply to emaste from comment #34)
> There are a couple of different issues discussed in this PR so I'm not sure
> if they are all resolved at this point or not. Mark?
Overall I expect closing 226519 as fixed again is
appropriate.
Supporting details:
With multiple problems at the same time making notes
indicating what is working vs. what is not can be a
mess: when one thing is fixed things can still fail
overall.
The particular type of issue 26519 directly applies
to is just code generation and its stack handling
properties, not other things tied to C++ exceptions
working (even though I note failures in thrown C++
exceptions).
I've not found any more evidence of the code generation
mishandling the stack or ordering the code in a way that
violates the ABI. My testing has been with FreeBSD's
clang 5 in head for some time now.
However, at this point I've not investigated the /sbin/init
failure that was noted at all to know if code generation
was involved. If code generation for stack operations was
involved then this could be re-opened if closed. (As has
happened before for incomplete stack handling fixes.)
The kernel fails to build for powerpc via clang 5 so I
do not get as far as testing /sbin/init based on clang 5
code generation.
Closing per feedback from Mark.
perl_numeric_powerpc.txt.zip
(65626 bytes, application/zip)0001-PPC-Properly-update-register-save-area-offsets-release-40.patch
(2159 bytes, text/plain)