clangupc / clang-upc

Clang UPC Front-End
https://clangupc.github.io/
Other
16 stars 5 forks source link

MIPS64 barrier failures #95

Open PHHargrove opened 8 years ago

PHHargrove commented 8 years ago

I am currently able to build clang-upc for MIPS64, both big- and little-endian, on Linux. Currently libupc only builds for the "n64" ABI.

Running clang-upc "native" (not w/ the Berkeley UPCR) with the Berkeley UPC test harness reveals that over 300 tests fail at runtime with messages like the following:

./cg-W: UPC error: UPC barrier identifier mismatch
thread 0 terminated with signal: 'Aborted'

This is reproducible on gcc22 (big-endian) and gcc{23,24} (little-endian) of the GCC CFarm. On the little-endian systems, I built with:

cmake \
        -DCMAKE_INSTALL_PREFIX:PATH=<SOMETHING> \
        -DLLVM_TARGETS_TO_BUILD:=Mips \
        -DCMAKE_BUILD_TYPE:=MinSizeRel \
        -DCMAKE_C_COMPILER=mipsel-linux-gnu-gcc-4.9 \
        -DCMAKE_CXX_COMPILER=mipsel-linux-gnu-g++-4.9 \
        -DCMAKE_C_FLAGS=-mabi=64 \
        -DCMAKE_CXX_FLAGS=-mabi=64 \
        -DCMAKE_ASM_FLAGS=-mabi=64 \
        -DLLVM_DEFAULT_TARGET_TRIPLE=mips64el-linux-gnu

On the big-endian system (gcc22) the system gcc/g++ is 4.6, which is too old to build clang-3.8. I had to build a newer gcc/g++ (I chose 4.9 to match the little-endian systems). That required that I build gmp, mpfr and mpc. That, in turn, required that I track down a patch to fix builds of mpfr on MIPS w/ gcc-4. So, you probably want to avoid trying to reproduce there. If you do want to try, I can probably open perms on my install of gcc-4.9 for you.

PHHargrove commented 8 years ago

I am looking into this issue.

I have GNU UPC tests running on the same platforms right now, and think it likely the same bug is present there.

An initial look at upc_sync.h as compared to the nearest equivalents in GASNet, GCC's sync atomics, and the Linux kernel suggest that libupc is incorrect in its assumption that MIPS does not require a Read Fence.

PHHargrove commented 8 years ago

So far I only have ABI=n64 builds of clang-upc and ABI=n32 builds of GNU UPC. The GNU UPC builds are not showing this error, but since the ABIs are different I cannot yet be sure if that is meaningful.

I continue to investigate.

PHHargrove commented 8 years ago

With ABI=n64 builds of GNU UPC I still don't see this error, despite the fact that upc_sync.h and upc_barrier.upc in the respective runtimes are nearly identical. I may have to give up on this issue as being too far outside my expertise (and because MIPS is likely to be of relatively low importance).

PHHargrove commented 8 years ago

I have completely testing GNU UPC and found no equivalent of this issue.

Adding the possibly-missing read fence in runtime/libupc/smp/upc_sync.h does not resolve this problem.

nenadv commented 8 years ago

Right now libupc is being compiled with -Os (optimization for speed). I was easily able to reproduce the problem with intrepid's 'test17' and it seems that barrier fails on negative ID values (in this case -1 which is used in UPC lock implementation - test17 is the first one to use locks).

Test passes if libupc compiled with -O0.

Test fails with only one thread.

GDB does not work on MIPS gcc23 (I'll try to build a new one) and was not able to simple debug it. However, after adding some print statements, it seems that this trivial line of code fails:

upc_barrier.upc

265   /* Check the barrier ID with the one from the notify phase.  */
266   if (barrier_id != INT_MIN && __upc_barrier_id != INT_MIN &&
267       __upc_barrier_id != barrier_id)
268     {
269       __upc_fatal ("UPC barrier identifier mismatch");
270     }
nenadv commented 8 years ago

I tried to rearrange the code with no success. The code generated in a bad case:

   1200039f4:   0240282d        move    a1,s2
   1200039f8:   8e220000        lw      v0,0(s1)
   1200039fc:   10500006        beq     v0,s0,120003a18 <$BB2_4>

I wonder if the processor has the load delay slot and v0 showing up on conditional branch is not loaded yet. GUPC optimized version has another instruction in between lw and beq.

Maybe we are not building clang correctly.

nenadv commented 8 years ago

Since gdb was segfaulting on gcc23 I had to add segfault instructions (((volatile int )0) = 0), generate core, and review registers, stack etc. Core of the problem is in this example:

extern void x (int);
int
main ()
{
  x (-3);
  upc_barrier (-3);
}

that generates this code on mpis:

        ld      $25, %call16(my_upc_barrier)($gp)
        jalr    $25
        daddiu  $4, $zero, -3

        daddiu  $1, $zero, 1
        dsll    $1, $1, 32
        ld      $25, %call16(__upc_barrier)($gp)
        jalr    $25
        daddiu  $4, $1, -3

Call to __upc_barrier will pass '-3' in the lower end of the register, and higher end (32-63) will be zero, as '1' was placed into it.

The llvm code is like this:

define i32 @upc_main() #0 {
  call void @x(i32 signext -3)
  call void @__upc_barrier(i32 -3)
  ret i32 0
}

On x86_64 there is no difference in LLVM between these two procedures.