Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Clang 4.0.0's "Target: powerpc-unknown-freebsd11.0" code generation is violating the SVR4 ABI (SEGV can result) #26518

Closed Quuxplusone closed 6 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR26519
Status RESOLVED FIXED
Importance P normal
Reported by Mark Millard (marklmi26-fbsd@yahoo.com)
Reported on 2016-02-07 14:35:42 -0800
Last modified on 2017-11-12 11:42:57 -0800
Version 4.0
Hardware Other FreeBSD
CC brad@comstyle.com, chmeeedalf@gmail.com, emaste@freebsd.org, kparzysz@quicinc.com, llvm-bugs@lists.llvm.org, rdivacky@freebsd.org
Fixed by commit(s)
Attachments perl_numeric_powerpc.txt.zip (65626 bytes, application/zip)
0001-PPC-Properly-update-register-save-area-offsets-release-40.patch (2159 bytes, text/plain)
Blocks PR25780
Blocked by
See also
Comparing clang 3.8.0 (via FreeBSD's projects/clang380-import svn) generated
code for TARGET_ARCH=powerpc (32-bit) to gcc 4.2.1 generated code. . .

clang 3.8.0 based Str_Match preamble (from make):

0x181a4a8 <Str_Match>:  mflr    r0
0x181a4ac <Str_Match+4>:    stw     r31,-4(r1) # Clang's frame pointer (r31)
                                                   # saved before stack pointer changed.
0x181a4b0 <Str_Match+8>:    stw     r0,4(r1)   # lr saved before stack pointer
changed.
0x181a4b4 <Str_Match+12>:   stwu    r1,-32(r1) # Stack pointer finally saved and
                                                   # changed.
0x181a4b8 <Str_Match+16>:   mr      r31,r1     # r31 is the frame pointer under
clang.
0x181a4bc <Str_Match+20>:   stw     r30,24(r31)

gcc 4.2.1 based Str_Match preamble:

0x1819cb8 <Str_Match>:  mflr    r0
0x1819cbc <Str_Match+4>:    stwu    r1,-32(r1) # Stack pointer saved and changed
first.
0x1819cc0 <Str_Match+8>:    stw     r31,28(r1) # r31 saved after stack pointer
changed.
0x1819cc4 <Str_Match+12>:   mr      r31,r3     # gcc 4.2.1 does not reserve
                                                   # r31 for use as a frame pointer.
0x1819cc8 <Str_Match+16>:   stw     r30,24(r1)
0x1819ccc <Str_Match+20>:   stw     r0,36(r1)  # lr saved after stack pointer
changed.

Picking a different example for postamble code, showing just clang 3.8.0's code:

0x1801b8c <Buf_AddBytes+104>:   lwz     r30,24(r31)
0x1801b90 <Buf_AddBytes+108>:   lwz     r29,20(r31)
0x1801b94 <Buf_AddBytes+112>:   lwz     r28,16(r31)
0x1801b98 <Buf_AddBytes+116>:   lwz     r27,12(r31)
0x1801b9c <Buf_AddBytes+120>:   lwz     r26,8(r31)
0x1801ba0 <Buf_AddBytes+124>:   addi    r1,r1,32   # Stack pointer adjusted first
0x1801ba4 <Buf_AddBytes+128>:   lwz     r0,4(r1)
0x1801ba8 <Buf_AddBytes+132>:   lwz     r31,-4(r1) # Then Frame Pointer load
happens
                                                   # "outside" the new stack range.
0x1801bac <Buf_AddBytes+136>:   mtlr    r0
0x1801bb0 <Buf_AddBytes+140>:   blr

In other words: clang 3.8.0's generated 32-bit powerpc code is based on there
being a safe scratch area below the stack ("below" by memory address). So
similar to the 224 byte "red zone" area that 32-bit AIX powerpc and 32-bit
Darwin powerpc use.

I'm told by Nathan Whithorn that "the 32-bit ELF ABI does not require any such
red zone" and so that clang 3.8.0 is violating the ABI that is supposed to be
involved.

I do not have specific document or section references (or web links) to list
for the ABI details at this time. I'm just reporting what I'm told by FreeBSD
folks.

I used "make" code as the example above because something like "make -j 6
buildworld" uses signal delivery extensively (SIGCHLD) and such a build its
gets a SEGV in a make process within the 1st few minutes (on a "Quad core G5
PowerMac" using a FreeBSD powerpc 32-bit installation). The signal delivery is
sometimes replacing the value at "-4(r1)" in the above code before it is loaded
back into r31 (the clang 3.8.0 framepointer register for powerpc as it is
currently generating code). The FreeBSD signal delivery for 32-bit powerpc does
not have/use a "red zone" on the smaller-address side of the stack.
Quuxplusone commented 8 years ago
FYI, additional ABI issue possibility:

On FreeBSD 11.0-CURRENT (well, projects/clang380-import) when I try "clang++ -
std=c++11 -dM -E just_main.cpp" I get. . .

#define __BIGGEST_ALIGNMENT__ 8
. . .
#define __NATURAL_ALIGNMENT__ 1

on both TARGET_ARCH=powerpc and TARGET_ARCH=powerpc64 contexts.

But for the special port devel/powerpc64-gcc used for modern cross compiles of
FreeBSD ( /usr/local/bin/powerpc64-portbld-freebsd11.0-g++ -std=c++11 -dM -E
just_main.cpp ) I get:

#define __BIGGEST_ALIGNMENT__ 16

(Natural is not referenced. There is no such special port for 32-bit powerpc to
see its output.)

So there may be an ABI alignment mismatch someplace as well unless the ABI has
some optional-status alignment rules.

Without a ABI reference document I'm unsure which __BIGGEST_ALIGNMENT__ would
be correct for the FreeBSD powerpc ABI (if either one is). (Similarly for
powerpc64.)

Side notes:

powerpc64-portbld-freebsd11.0-g++ also reports:

#define __CMODEL_MEDIUM__ 1
. . .
#define __cpp_rvalue_reference 200610

But clang++ 3.8.0 reports nothing analogous to the first and:

#define __cpp_rvalue_references 200610

(gcc and clang have spelling differences here.)
Quuxplusone commented 8 years ago
"FreeBSD's" Justin Hibbits added a note in the FreeBSD bug entry (206990) that
I repeat here:

There is no provision in the ABI for a redzone in 32-bit powerpc.  LLVM is
broken for 32-bit PowerPC regarding this, and there are comments in the source
code to this regard, to the effect:

(PPCFrameLowering.cpp):
    // FIXME: On PPC32 SVR4, we must not spill before claiming the stackframe.

If a signal interrupts the thread at the precise wrong time (when creating the
stack frame, but before adjusting %r1), Bad Things will happen.
Quuxplusone commented 8 years ago

It appears that I originally forgot to set the Component for my submittal. Looking around I estimate that LLVM Codegen is the closest category: PPCFrameLowering.cpp is not under tools/clang/ at all so C++ would seem to be over specific.

Quuxplusone commented 8 years ago
(In reply to comment #1)

Just a note on __cpp_rvalue_reference vs. __cpp_rvalue_references :

> Side notes:
>
> powerpc64-portbld-freebsd11.0-g++ also reports:
>
> #define __CMODEL_MEDIUM__ 1
> . . .
> #define __cpp_rvalue_reference 200610
>
> But clang++ 3.8.0 reports nothing analogous to the first and:
>
> #define __cpp_rvalue_references 200610
>
> (gcc and clang have spelling differences here.)

https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00175.html says:

> as PR71214 points out gcc uses a wrong feature test macro for C++11
> rvalue references: __cpp_rvalue_reference instead of the correct
> __cpp_rvalue_references.

So this specific point is a gcc problem but libraries targeting supporting what
gcc has done historically will now have 2 names to test the value of.
Quuxplusone commented 8 years ago

Patch for review: https://reviews.llvm.org/D24093

Quuxplusone commented 8 years ago

Committed in r280705.

Quuxplusone commented 8 years ago
(In reply to comment #6)
> Committed in r280705.

Thanks Krzysztof.

Dimitry Andric (dim at FreeBSD.org) has written:

> I merged the upstream fix to projects/clang390-import:
>
> https://svnweb.freebsd.org/changeset/base/305686

So FreeBSD head (current) for 12 will be adopting your changes.

As for my activity:

I'll not have access to powerpc64s/powerpcs for a few weeks yet.
Quuxplusone commented 8 years ago
(In reply to comment #6)
> Committed in r280705.

Did the commit also fix the stack pointer adjustment timing for the post-amble
code?

The wording that I see in the review and commit talks about the "claim" side of
things, which I interpret to be for the pre-amble side of things. If I
interpret the code right that is the side fixed. (I'm not clang/llvm code
literate so I could easily be wrong.)

My original submittal also noted the timing problem existed on the post-amble
side in 3.8.0's code generation:

0x1801b8c <Buf_AddBytes+104>:   lwz     r30,24(r31)
0x1801b90 <Buf_AddBytes+108>:   lwz     r29,20(r31)
0x1801b94 <Buf_AddBytes+112>:   lwz     r28,16(r31)
0x1801b98 <Buf_AddBytes+116>:   lwz     r27,12(r31)
0x1801b9c <Buf_AddBytes+120>:   lwz     r26,8(r31)
0x1801ba0 <Buf_AddBytes+124>:   addi    r1,r1,32   # Stack pointer adjusted first
0x1801ba4 <Buf_AddBytes+128>:   lwz     r0,4(r1)
0x1801ba8 <Buf_AddBytes+132>:   lwz     r31,-4(r1) # Then Frame Pointer load
happens
                                                   # "outside" the new stack range.
0x1801bac <Buf_AddBytes+136>:   mtlr    r0
0x1801bb0 <Buf_AddBytes+140>:   blr

If such code can still be generated there is is a time frame needing a red-zone
to protect stack contents.

Hopefully I'm just wrong and this was fixed too.
Quuxplusone commented 8 years ago

The post-amble has not been fixed.

Quuxplusone commented 8 years ago

The epilogue part of the fix: https://reviews.llvm.org/D24466

Hopefully there is nothing else missing.

Quuxplusone commented 8 years ago

Committed in r282174.

Quuxplusone commented 7 years ago
(In reply to comment #11)
> Committed in r282174.

projects/clang390-import has been merged to head for FreeBSD --and is
now scheduled for an MFC from there into stable/11 in a month. It does
not yet have this fix.

But there is a chance for later 11.x-RELEASE's to support powerpc64 and/or
powerpc via clang, not just 12.0-RELEASE. (I'm not implying in a month
specifically, just that stable/11 might progress as llvm does as long
as the fixes backport into FreeBSD's 3.9.0 reasonably --and that 11.x
would pick up such changes from stable/11 .)

I'm still hoping that llvm 26519's and 26970's fixes will be applied to
clang 3.9.0 in FreeBSD's head soon. That would enable some significant
testing. I could try buildworld for powerpc64 and I could remove my "red
zone" signal delivery hack for powerpc to see if it is no longer needed
[lack of ABI violations].

[I want to be able to have good runs/results of the kyua tests before I
try to deal with testing buildkernel. This requires signal handling and
C++ exception handling to be working well even if the kernel does not
require all of that.]
Quuxplusone commented 7 years ago

For reference this is now in FreeBSD as https://svnweb.freebsd.org/changeset/base/309147

Quuxplusone commented 7 years ago
(In reply to comment #13)
> For reference this is now in FreeBSD as
> https://svnweb.freebsd.org/changeset/base/309147

Unfortunately there are problems handling use of r30, at least when floating
point is involved.

The used code from the compliler_rt __floatdidf seems to be just wrong in part
of its use of r30. The below initially uses the "df -m" example failure and its
code.

At first is saves r30 on the stack as part of the function preamble:

Dump of assembler code for function __floatdidf:
0x018029b8 <__floatdidf+0>: mflr    r0
0x018029bc <__floatdidf+4>: stw     r0,4(r1)
0x018029c0 <__floatdidf+8>: stwu    r1,-32(r1)
0x018029c4 <__floatdidf+12>:    stw     r31,28(r1)
0x018029c8 <__floatdidf+16>:    stw     r30,24(r1)

Later it uses r30 for other things.

Then it restores it from the stack:

0x01802a08 <__floatdidf+80>:    lwz     r30,24(r1)

What it does with r30 next effectively makes r30 a parameter to the routine and
not just saved/restored:

0x01802a10 <__floatdidf+88>:    lwz     r3,-8(r30)

In the above the r30 from the caller's context is being used as the base for
accessing memory and putting that memory content in r3.

What it does next with r3 is one of the points where SIGSEGV is happening: For
"df -m" r3 ends up being 0 and the following fails.

0x01802a18 <__floatdidf+96>:    lfs     f12,0(r3)

In the fsck_ufs example it is the same sort of thing but r30 happens to be
initially 0 so it fails at an earlier stage.

Again there is the preamble that saves r30:

0x0181afcc <__floatdidf+0>: mflr    r0
0x0181afd0 <__floatdidf+4>: stw     r0,4(r1)
0x0181afd4 <__floatdidf+8>: stwu    r1,-32(r1)
0x0181afd8 <__floatdidf+12>:    stw     r31,28(r1)
0x0181afdc <__floatdidf+16>:    stw     r30,24(r1)

Later it uses r30 for other things.

Then it restores it from the stack:

0x0181b01c <__floatdidf+80>:    lwz     r30,24(r1)

What it does with r30 next effectively makes r30 a parameter to the routine and
not just saved/restored:

0x0181b024 <__floatdidf+88>:    lwz     r3,-12(r30)

Again: In the above the r30 from the caller's context is being used as the base
for accessing memory and putting that memory content in r3.

But this time it turns out that r30 is 0 and the above also fails.

If the code had gotten that far it would have done the same thing with r3 that
"df -m" did in its failure:

0x0181b02c <__floatdidf+96>:    lfs     f12,0(r3)

For reference compiler-rt/lib/builtins/floatdidf.c has:

COMPILER_RT_ABI double
__floatdidf(di_int a)
{
    static const double twop52 = 4503599627370496.0; // 0x1.0p52
    static const double twop32 = 4294967296.0; // 0x1.0p32

    union { int64_t x; double d; } low = { .d = twop52 };

    const double high = (int32_t)(a >> 32) * twop32;
    low.x |= a & INT64_C(0x00000000ffffffff);

    const double result = (high - twop52) + low.d;
    return result;
}

and the matching assembler code for the fsck_ufs example is (from gdb):

0x0181afcc <__floatdidf+0>: mflr    r0
0x0181afd0 <__floatdidf+4>: stw     r0,4(r1)
0x0181afd4 <__floatdidf+8>: stwu    r1,-32(r1)
0x0181afd8 <__floatdidf+12>:    stw     r31,28(r1)
0x0181afdc <__floatdidf+16>:    stw     r30,24(r1)
0x0181afe0 <__floatdidf+20>:    bl      0x182e96c <.got+20>
0x0181afe4 <__floatdidf+24>:    mr      r31,r1
0x0181afe8 <__floatdidf+28>:    xoris   r3,r3,32768
0x0181afec <__floatdidf+32>:    lis     r5,17200
0x0181aff0 <__floatdidf+36>:    mflr    r30
0x0181aff4 <__floatdidf+40>:    stw     r3,12(r31)
0x0181aff8 <__floatdidf+44>:    stw     r5,8(r31)
0x0181affc <__floatdidf+48>:    lwz     r3,-16(r30)
0x0181b000 <__floatdidf+52>:    lwz     r6,-20(r30)
0x0181b004 <__floatdidf+56>:    lfd     f1,8(r31)
0x0181b008 <__floatdidf+60>:    stw     r4,20(r31)
0x0181b00c <__floatdidf+64>:    stw     r5,16(r31)
0x0181b010 <__floatdidf+68>:    lfd     f13,16(r31)
0x0181b014 <__floatdidf+72>:    lwz     r0,36(r1)
0x0181b018 <__floatdidf+76>:    lwz     r31,28(r1)
0x0181b01c <__floatdidf+80>:    lwz     r30,24(r1)
0x0181b020 <__floatdidf+84>:    lfs     f2,0(r3)
0x0181b024 <__floatdidf+88>:    lwz     r3,-12(r30)
0x0181b028 <__floatdidf+92>:    lfs     f0,0(r6)
0x0181b02c <__floatdidf+96>:    lfs     f12,0(r3)
0x0181b030 <__floatdidf+100>:   fsub    f0,f1,f0
0x0181b034 <__floatdidf+104>:   fmul    f0,f0,f2
0x0181b038 <__floatdidf+108>:   fadd    f0,f0,f12
0x0181b03c <__floatdidf+112>:   fadd    f1,f13,f0
0x0181b040 <__floatdidf+116>:   addi    r1,r1,32
0x0181b044 <__floatdidf+120>:   mtlr    r0
0x0181b048 <__floatdidf+124>:   blr
Quuxplusone commented 7 years ago
(In reply to comment #14)

My interpretation of this is that the "lwz r30,24(r1)" is simply out
of sequence and should be later in the sequence:

> 0x0181b01c <__floatdidf+80>:  lwz     r30,24(r1)  <<<=== not here
> 0x0181b020 <__floatdidf+84>:  lfs     f2,0(r3)
> 0x0181b024 <__floatdidf+88>:  lwz     r3,-12(r30)
> 0x0181b028 <__floatdidf+92>:  lfs     f0,0(r6)
> 0x0181b02c <__floatdidf+96>:  lfs     f12,0(r3)
> 0x0181b030 <__floatdidf+100>: fsub    f0,f1,f0
> 0x0181b034 <__floatdidf+104>: fmul    f0,f0,f2
> 0x0181b038 <__floatdidf+108>: fadd    f0,f0,f12
> 0x0181b03c <__floatdidf+112>: fadd    f1,f13,f0
lwz     r30,24(r1)  <<<<<<<<<<=============== instead here
> 0x0181b040 <__floatdidf+116>: addi    r1,r1,32
> 0x0181b044 <__floatdidf+120>: mtlr    r0
> 0x0181b048 <__floatdidf+124>: blr

(addresses not adjusted for the change in the above)

So that r30 would be restored between the main activity being
finished and r1 being adjusted (popping the stack frame). Of
course any place after the last local r30 use [after -12(r30)]
up to where I show it would also work.

I reported this issue as a comment on 26519 because I'm guessing
that the ordering problem is tied to trying to trying to avoid the
stack-handling ABI violations for FreeBSD: another incompleteness
in the coverage of the stack handling. If it is not that then a
separate submittal would likely be appropriate.

Since I'm allowed to I'm going to change the status to reopened so
that it does not look to have been fully fixed. Let me know if this
is inappropriate since I would not be the one fixing it.
Quuxplusone commented 7 years ago
FYI: FreeBSD has merged clang/llvm 4.0 into stable/11 (-r316423)
in addition to head (12) some time ago. release/11.1.0 will be
clang/llvm 4.0 based.

TARGET_ARCH=powerpc64 and TARGET_ARCH=powerpc still does not
handle thrown exceptions in this clang version.

TARGET_ARCH=powerpc still has the R30 vs. tack handling vs.
floating point code problem: restored R30 and then generates
floating point cleanup code that expects R30 to not have
changed yet.

(That last is why this was reopened, although at the time
the clang was an earlier version.)
Quuxplusone commented 7 years ago
Here's what I tried:

--- float-r30.c ---
double __floatdidf(long long int a) {
    static const double twop52 = 4503599627370496.0;
    static const double twop32 = 4294967296.0;

    union { long long int x; double d; } low = { .d = twop52 };

    const double high = (int)(a >> 32) * twop32;
    low.x |= a & 0x00000000ffffffffL;

    const double result = (high - twop52) + low.d;
    return result;
}
-------------------

clang -target powerpc -S -O2 float-r30.c -fpic

Output:

--- float-r30.s ---
        .text
        .file   "float-r30.c"
        .section        .rodata.cst4,"aM",@progbits,4
        .p2align        2
.LCPI0_0:
        .long   1501560836              # float 4.50360177E+15
.LCPI0_1:
        .long   1333788672              # float 4.2949673E+9
.LCPI0_2:
        .long   3649044480              # float -4.50359963E+15
        .text
        .globl  __floatdidf
        .p2align        2
        .type   __floatdidf,@function
__floatdidf:                            # @__floatdidf
.Lfunc_begin0:
# BB#0:                                 # %entry
        mflr 0
        stw 0, 4(1)
        stwu 1, -32(1)
        stw 31, 28(1)
        stw 30, 24(1)
        bl _GLOBAL_OFFSET_TABLE_@local-4
        mr 31, 1
        lis 5, 17200
        xoris 3, 3, 32768
        mflr 30
        stw 5, 8(31)
        stw 3, 12(31)
        lwz 3, .LCPI0_1@GOT(30)
        lwz 6, .LCPI0_0@GOT(30)
        lfd 1, 8(31)
        stw 4, 20(31)
        stw 5, 16(31)
        lfd 13, 16(31)
        lwz 0, 36(1)
        lwz 31, 28(1)
        lwz 30, 24(1)
        lfs 2, 0(3)
        lwz 3, .LCPI0_2@GOT(30)
        lfs 0, 0(6)
        lfs 12, 0(3)
        fsub 0, 1, 0
        fmul 0, 0, 2
        fadd 0, 0, 12
        fadd 1, 0, 13
        addi 1, 1, 32
        mtlr 0
        blr
.Lfunc_end0:
        .size   __floatdidf, .Lfunc_end0-.Lfunc_begin0

        .ident  "clang version 5.0.0 (http://llvm.org/git/clang.git fcb52251e4b32f8f4ab0a4d434c3f9c5a5ffe008) (http://llvm.org/git/llvm.git 624e695cd90a7b570df26159f225f79fc8a86976)"
        .section        ".note.GNU-stack","",@progbits
-------------------

The r30 is used as a PIC base pointer, and it is restored before it's used.
This is actually correct.
Quuxplusone commented 7 years ago

I take it back. I just noticed that it's set to GOT at the entry to the function, not before it.

Quuxplusone commented 7 years ago

Committed a fix in r302183.

Quuxplusone commented 7 years ago

Opened PR32930 to request merging this into 4.0.1.

Quuxplusone commented 7 years ago
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.

Thanks. FreeBSD's head merged it in as -r317810
and I've built and booted -r317820 based on:

A) buildworld built via the updated system-clang 4
B) buildkernel built via gcc 4.2.1

It appears for (A) that even C++ exceptions work to
some extent (possibly fully) and various previously
broken commands now work (no stack problem).

Thanks much. Big improvement.

stable/11 will probably merge the change in
a few days.

Booting a kernel from a system-clang 4 based
buildkernel fails with "exec /sbin/init:
error 13". This is not new. But it does get
to the point of attempting /sbin/init .

(Note: powerpc64 could use the 2 line
patch listed in llvm bugzilla 31716's Comment
1 to get to a somewhat similar state of
usability on FreeBSD. As stands I use a
personal patch for the issue but that is just
me.)
Quuxplusone commented 7 years ago
(In reply to Mark Millard from comment #21)
> (In reply to Krzysztof Parzyszek from comment #19)
> > Committed a fix in r302183.
>
> Thanks. FreeBSD's head merged it in as -r317810
> and I've built and booted -r317820 based on:
>
> A) buildworld built via the updated system-clang 4
> B) buildkernel built via gcc 4.2.1
>
> It appears for (A) that even C++ exceptions work to
> some extent (possibly fully) and various previously
> broken commands now work (no stack problem).

I got testing clang vs. gcc 4.2.1 confused: only the
gcc 4.2.1 buildworld ends up with a c++ compiler where
handling thrown C++ exceptions works. When clang++ is
used handling thrown C++ exceptions fails in the program
that results, like before.

(powerpc64 has the same sort of C++ exception handling
issues.)

But the stack handling fix is still a great improvement
for powerpc: thanks again.
Quuxplusone commented 7 years ago
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.

buildworld buildkernel makes extensive use
of signals and its failure is how I discovered
the original stack handling problems. I used to
have to patch in so-called "red zone" handling
to avoid the issue.

No more: a running a kernel that was built
without a "red zone" and running a world based
on clang now allows buildworld buildkernel to
complete just fine.

Not needing a "red zone" is a great
improvement.
Quuxplusone commented 7 years ago
(In reply to Krzysztof Parzyszek from comment #19)
> Committed a fix in r302183.

Bad news: Another code generation error, this
time demonstrated in compiling part of perl5. . .
(I tried to build a port that indirectly
tried build perl5 but perl5's miniperl crashes.)

/usr/obj/portswork/usr/ports/lang/perl5.24/work/perl-5.24.1/numeric.c

has Perl_cast_iv(NV f) for which clang double stores
two different things to one address [24(r1)]. Below
the => lines are the double store, the second
destroying the r30 value that was saved in the first:

Dump of assembler code for function Perl_cast_iv:
   0x0196a114 <+0>: mflr    r0
   0x0196a118 <+4>: stw     r0,4(r1)
   0x0196a11c <+8>: stwu    r1,-32(r1)
   0x0196a120 <+12>:    stw     r31,28(r1)
=>   0x0196a124 <+16>:  stw     r30,24(r1)
   0x0196a128 <+20>:    mr      r31,r1
   0x0196a12c <+24>:    mfcr    r12
=>   0x0196a130 <+28>:  stw     r12,24(r31)

Note: r31 == r1 for that second "=>" line.

The return code sequence has a similar
problem: two loads from the same address.

Note: r31 == r1 here too.

=>   0x0196a1bc <+168>: lwz     r12,24(r31)
   0x0196a1c0 <+172>:   lwz     r0,36(r1)
   0x0196a1c4 <+176>:   lwz     r31,28(r1)
=>   0x0196a1c8 <+180>: lwz     r30,24(r1)
   0x0196a1cc <+184>:   mtcrf   32,r12
   0x0196a1d0 <+188>:   addi    r1,r1,32
   0x0196a1d4 <+192>:   mtlr    r0
   0x0196a1d8 <+196>:   blr

The Perl_cast_iv source code looks like:

IV
Perl_cast_iv(NV f)
{
  if (f < IV_MAX_P1)
    return f < IV_MIN ? IV_MIN : (IV) f;
  if (f < UV_MAX_P1) {
#if CASTFLAGS & 2
    /* For future flexibility allowing for sizeof(UV) >= sizeof(IV)  */
    if (f < UV_MAX_P1_HALF)
      return (IV)(UV) f;
    f -= UV_MAX_P1_HALF;
    return (IV)(((UV) f) | (1 + (UV_MAX >> 1)));
#else
    return (IV)(UV) f;
#endif
  }
  return f > 0 ? (IV)UV_MAX : 0 /* NaN */;
}
Quuxplusone commented 7 years ago

Could you attach a preprocessed file?

Quuxplusone commented 7 years ago

Attached perl_numeric_powerpc.txt.zip (65626 bytes, application/zip): .zip of preprocessed numeric.c from perl5 for comments 24 & 25

Quuxplusone commented 7 years ago

I posted a patch for master in https://reviews.llvm.org/D33017, but it doesn't apply cleanly to release_40.

I'm attaching the version for 4.0. Could you try if it works?

Quuxplusone commented 7 years ago

Attached 0001-PPC-Properly-update-register-save-area-offsets-release-40.patch (2159 bytes, text/plain): Patch for release_40

Quuxplusone commented 7 years ago
(In reply to Krzysztof Parzyszek from comment #28)
> I posted a patch for master in https://reviews.llvm.org/D33017, but it
> doesn't apply cleanly to release_40.
>
> I'm attaching the version for 4.0.  Could you try if it works?

I'll see about updating to such a clang and retrying the
port builds with it, for example.

I'll note that Roman Divacky has me trying for clang 4:

# svnlite diff /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
Index: /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
===================================================================
--- /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp   (revision
317820)
+++ /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp   (working copy)
@@ -1187,7 +1187,7 @@
       // For SVR4, don't emit a move for the CR spill slot if we haven't
       // spilled CRs.
       if (isSVR4ABI && (PPC::CR2 <= Reg && Reg <= PPC::CR4)
-          && !MustSaveCR)
+          && (!MustSaveCR && isPPC64))
         continue;

       // For 64-bit SVR4 when we have spilled CRs, the spill location

for having the CFI information written out for condition code
register use for powerpc (later code not shown). He is looking
into some of the issues for C++ handling thrown exceptions.
Quuxplusone commented 7 years ago
(In reply to Krzysztof Parzyszek from comment #28)
> I posted a patch for master in https://reviews.llvm.org/D33017, but it
> doesn't apply cleanly to release_40.
>
> I'm attaching the version for 4.0.  Could you try if it works?

I've done a buildworld based on the patch, installed it,
booted it, and: built and installed /usr/ports/lang/perl5.24
with it. That includes the internal miniperl build that
previous created a miniperl that crashed --and the use of
the new miniperl in the later stages of building perl:
no crashes and things seem to be operating.

So it looks good so far for what the patch was intended
to fix (32-bit powerpc context).
Quuxplusone commented 7 years ago
(In reply to Mark Millard from comment #31)
>
> So it looks good so far for what the patch was intended
> to fix (32-bit powerpc context).

That is great news. Thank you for doing all this work.
Quuxplusone commented 7 years ago

The fix has been committed in master in r303257.

I opened PR33070 to request merging it into 4.0.1.

Quuxplusone commented 6 years ago

There are a couple of different issues discussed in this PR so I'm not sure if they are all resolved at this point or not. Mark?

Quuxplusone commented 6 years ago
(In reply to emaste from comment #34)
> There are a couple of different issues discussed in this PR so I'm not sure
> if they are all resolved at this point or not. Mark?

Overall I expect closing 226519 as fixed again is
appropriate.

Supporting details:

With multiple problems at the same time making notes
indicating what is working vs. what is not can be a
mess: when one thing is fixed things can still fail
overall.

The particular type of issue 26519 directly applies
to is just code generation and its stack handling
properties, not other things tied to C++ exceptions
working (even though I note failures in thrown C++
exceptions).

I've not found any more evidence of the code generation
mishandling the stack or ordering the code in a way that
violates the ABI. My testing has been with FreeBSD's
clang 5 in head for some time now.

However, at this point I've not investigated the /sbin/init
failure that was noted at all to know if code generation
was involved. If code generation for stack operations was
involved then this could be re-opened if closed. (As has
happened before for incomplete stack handling fixes.)

The kernel fails to build for powerpc via clang 5 so I
do not get as far as testing /sbin/init based on clang 5
code generation.
Quuxplusone commented 6 years ago

Closing per feedback from Mark.