clangupc / clang-upc

Clang UPC Front-End
https://clangupc.github.io/
Other
16 stars 5 forks source link

[PPC] upc_resetphase() fails - argument mismatch #45

Closed nenadv closed 8 years ago

nenadv commented 10 years ago

I rebuilt PPC version and noticed the following failure. Here is the simplified version of the test:

include <stdio.h>
#include <upc.h>
#include <gula.h>

shared [5] int a[5*THREADS];
int
main()
{
    shared [5] int *p = &a[3];
    shared [5] int *pr;

    pr = upc_resetphase(p); /* should reset phase */

    if (upc_phaseof(p) != 3 || upc_phaseof(pr) != 0)
        GULA_FAIL("failed in upc_phaseof() check - 2");

    upc_barrier;

    if (!MYTHREAD)
        printf ("Passed.\n");
}

When upc_resetphase() is called, register R4 is set to the UPC pointer value, or the value is being placed on the stack:

12          pr = upc_resetphase(p); /* should reset phase */
(gdb) x/10i $pc
=> 0x10001c18 <main+264>:       std     r3,256(r31)
   0x10001c1c <main+268>:       std     r3,56(r1)
   0x10001c20 <main+272>:       ld      r4,256(r31)
   0x10001c24 <main+276>:       addi    r3,r31,264
   0x10001c28 <main+280>:       bl      0x100021ac <.upc_resetphase>

but upc_resetphase() expects pointer in R3:

=> 0x100021ac <.upc_resetphase>:        std     r3,-16(r1)
   0x100021b0 <.upc_resetphase+4>:      ld      r3,-16(r1)
   0x100021b4 <.upc_resetphase+8>:      std     r3,-24(r1)
   0x100021b8 <.upc_resetphase+12>:     ld      r3,-24(r1)
   0x100021bc <.upc_resetphase+16>:     lis     r4,-16
   0x100021c0 <.upc_resetphase+20>:     and     r3,r3,r4
   0x100021c4 <.upc_resetphase+24>:     std     r3,-24(r1)
   0x100021c8 <.upc_resetphase+28>:     ld      r3,-24(r1)
   0x100021cc <.upc_resetphase+32>:     blr

I recall that in GUPC we had to change PPC ABI for something like this.

PHHargrove commented 10 years ago

Nenad,

PPC32 or PPC64 target?

For PPC64 the ABI says to pass in R3, which matches the callee code you show. For PPC32 the ABI should pass a uint64_t split across R3 and R4. Your caller code doesn't look like either case.

nenadv commented 10 years ago

This is PPC64 target. IR on the caller side shows the following:

%28 = load %__upc_shared_pointer_type* %p, align 8, !dbg !30
store %__upc_shared_pointer_type %28, %__upc_shared_pointer_type* %3, 
align 8, !dbg !30
call void @upc_resetphase(%__upc_shared_pointer_type* sret %2, 
%__upc_shared_pointer_type* byval %3), !dbg
!30
%29 = load %__upc_shared_pointer_type* %2, align 8, !dbg !30

The above seems like we want to store the shared pointer and pass the address of it.

PHHargrove commented 10 years ago

The above seems like we want to store the shared pointer and pass the address of it.

Ah, yes. That does appear to be what the caller has done in the 1st and 4th lines of code (assuming r31 is a pointer to the space for spills). However the PPC64 ABI says has both scalars and small fixed-length aggregates passed in registers, making this calling sequence erroneous even if the shared pointer type were an opaque 64, 96 or 128-bit struct. See http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html#PARAM-PASS

I am not 100% sure what the 2nd and 3rd lines are expected to accomplish, but based on the note at the bottom of the page I referenced above, it looks like there is a copy being made in the parameter save area. This could be for use by a debugger, or it may indicate that no prototype for upc_resetphase() was in scope?

PHHargrove commented 10 years ago

Reading the ABI docs again I suspect the issue is that scalars and fixed-length aggregates (upto 64 bytes) are passed by value and variable-length ones are passed by reference. I am going to guess that the upc shared pointer type is being misclasified.

nenadv commented 10 years ago

I think that we have a problem with the return value. From the ABI:

"Aggregates or unions of any length, and character strings of length longer than 8 bytes, will be returned in a storage buffer allocated by the caller."

On the caller side we see that R3 has the storage pointer and R4 has the shared pointer. On the "C" side, shared pointer type is a 64 bit integral type and it is returned in register.

On GUPC side we did fix the ABI, but only for the struct representation (I believe to pass by reference always).

PHHargrove commented 10 years ago

I think that we have a problem with the return value. From the ABI [....]

I agree with the diagnosis. I should have read one sub-sub-section farther.

nenadv commented 10 years ago

This patch fixes the problem for packed, but we need to have packed/struct check too and pass the struct by reference, but I don't see that option being saved.

diff --git a/lib/CodeGen/TargetInfo.cpp b/lib/CodeGen/TargetInfo.cpp
index 59e947b..6f3025c 100644
--- a/lib/CodeGen/TargetInfo.cpp
+++ b/lib/CodeGen/TargetInfo.cpp
@@ -2930,6 +2930,10 @@ PPC64_SVR4_ABIInfo::classifyReturnType(QualType RetTy) const {
   if (RetTy->isAnyComplexType())
     return ABIArgInfo::getDirect();

+  if (RetTy->hasPointerToSharedRepresentation()) {
+    return ABIArgInfo::getDirect();
+  }
+
   if (isAggregateTypeForABI(RetTy))
     return ABIArgInfo::getIndirect(0);

Steven, I only see these options saved:

LANGOPT(UPCThreads, 32, 0, "number of threads")                                 
LANGOPT(UPCPhaseBits, 8, 20, "number of phase bits")                            
LANGOPT(UPCThreadBits, 8, 10, "number of thread bits")                          
LANGOPT(UPCAddrBits, 8, 34, "number of addr bits")                              
LANGOPT(UPCVaddrFirst, 1, 0, "UPC pointer field order")                         
LANGOPT(UPCInlineLib, 1, 0, "inline the UPC runtime")                           
LANGOPT(UPCPreInclude, 1, 0, "pre-include UPC runtime header files")
PHHargrove commented 10 years ago

I don't see that option being saved

I was looking for that in the past, too. Here is what I found several instances of in CGExprUPC.cpp:

llvm::Value *CodeGenFunction::EmitUPCPointerGetPhase(llvm::Value *Pointer) {
  const LangOptions& LangOpts = getContext().getLangOpts();
  unsigned PhaseBits = LangOpts.UPCPhaseBits;
  unsigned ThreadBits = LangOpts.UPCThreadBits;
  unsigned AddrBits = LangOpts.UPCAddrBits;
  if (PhaseBits + ThreadBits + AddrBits == 64) {
    // do packed stuff
  } else {
    // do struct stuff
  }
}
nenadv commented 10 years ago

Yes, I added this to my last fix:

+  if (RetTy->hasPointerToSharedRepresentation()) {
+    if (getContext().getLangOpts().UPCAddrBits == 64)
+      return ABIArgInfo::getIndirect(0);
+    else 
+      return ABIArgInfo::getDirect();
+  }
+

But, we'll probably need struct/packed option if we are going to support 32 bits on PPC platform.

I took a look at our PPC change for struct representation where we made sure that we always pass by reference. Otherwise you need to worry about SVR4 ABI where small structures are passed in registers, etc.. And by default, GCC uses SVR4, but AIX and Darwin are using AIX ABI where structures are always passed by reference.

nenadv commented 10 years ago

Fixed with https://github.com/Intrepid/clang-upc/commit/e710cb667e36f65b76d925be8fd027c5a3a22d27

nenadv commented 8 years ago

After merge to 3.5/3.6 it seems that the above fix does not work for struct pointer representation. Intrepid's test18 fails on upc_resetphase() with similar symptoms. Looking into llvm IR code:

I tried to remove indirect return, but then something else broke.

nenadv commented 8 years ago

ABI for PPC has been changed to v2. By default they return 16 bytes aggregates in two registers (dorect). Removed the previous patch.

Fixed by this commit 4f500bf26a9175643c6567b71a37259cb9996dbf