clangupc / clang-upc

Clang UPC Front-End
https://clangupc.github.io/
Other
16 stars 5 forks source link

test18 failures with -fupc-pts=struct on OpenBSD X86 #66

Open PHHargrove opened 10 years ago

PHHargrove commented 10 years ago

I now have OpenBSD testers for clang-upc on both amd64 and i386, and have chosen to configure with --with-upc-pts=struct for more coverage.

In conducting the initial "smoke test" run of the Intrepid suite I encountered failures of test18 only on the i386 system. In a debug build I get the following failure:

[intrepid/test18_st02]   0sec  20140709_160046  FAILED (CRASH=SIGTERM/NEW)
commandline: [env  UPC_QUIET=1 ./test18_st02 -n 2 ]
PassExpr: passed
FailExpr: rror
--- App stdout ---
--- App stderr ---
./test18_st02: UPC error: Thread number in shared address is out of range
thread 1 terminated with signal: 'Abort trap'

While a non-debug build gets a SEGV instead:

[intrepid/test18_st02]   0sec  20140709_160046  FAILED (CRASH=SIGTERM/NEW)
commandline: [env  UPC_QUIET=1 ./test18_st02 -n 2 ]
PassExpr: passed
FailExpr: rror
--- App stdout ---
--- App stderr ---
./test18_st02: UPC error: Thread number in shared address is out of range
thread 1 terminated with signal: 'Abort trap'

Outputs above show the static-threads builds of the test, but the dynamic threads cases fail in the same manner.

I went on to investigate other 32-bit platforms and found the majority to fail test18 with the struct PTS.

On an x86 build on FreeBSD I get a SEGV:

$ ./a.out -n2
thread 1 terminated with signal: 'Segmentation fault'
Terminated

On an "-m32" build on Mac OS X I see a different failure mode:

$ ./a.out -n2
./a.out: UPC error: Invalid conversion of shared address to local pointer;
thread does not have affinity to shared address
thread 0 terminated with signal: 'Abort trap'
Terminated: 15

On an x86 build NetBSD I don't see any error.

I don't presently have any 32-bit builds for Linux.

In all of the cases reported above as failing, I have verified that there is no error with the packed PTS representation.

nenadv commented 10 years ago

Confirmed the issue on my VM too. I was not able to duplicate it on Linux 32 machine, and I thought this was good as I can compare the code. It turns out that code is completely different as Linux uses xmm registers in the generated code while FreeBSD does not. Error can be duplicated with -O0 and only one thread which is good for debugging.

Error can be duplicated with this code:

shared [5] int a_blk5[10*THREADS];
shared [5] int *ptr_to_blk5;

void
test18()
{
  int got;
  int expected;
  /* bug 52: upc_resetphase unimplemented */
  ptr_to_blk5 = upc_resetphase (&a_blk5[1]);
  got = upc_phaseof (ptr_to_blk5);
  expected = 0;
  upc_barrier;
}

I think the issue is related to an optimization where FreeBSD does not save/use the frame pointer. Instead, stack pointer is used for the register spill:

        movl    %eax, 20(%esp)          # 4-byte Spill
        calll   upc_resetphase
[...]
        subl    $4, %esp
[...]
        movl    20(%esp), %eax          # 4-byte Reload

Looks like code generation bug, and we might be able to create a C test case for this.

nenadv commented 10 years ago

I did try to create a test case for this without any luck.

PHHargrove commented 9 years ago

Today I retested clang-upc on openbsd-i386 configured using --with-upc-pts=struct. The failures below were observed at runtime and are not present with --with-upc-pts=packed.

run.rpt:[bugzilla/bug276_st04]   0sec  20150224_150819  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[bugzilla/bug276]   0sec  20150224_150820  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[guts_main/resetphase1_st04]   0sec  20150224_151129  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[guts_main/resetphase1]   0sec  20150224_151130  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[guts_main/resetphase2_st04]   0sec  20150224_151130  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[guts_main/resetphase2]   0sec  20150224_151130  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[intrepid/test18_st04]   0sec  20150224_151306  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[intrepid/test18]   1sec  20150224_151307  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[bugzilla/bug88_st02]   0sec  20150224_152645  FAILED (CRASH=SIGTERM/NEW)
run.rpt:[bugzilla/bug88]   1sec  20150224_152646  FAILED (CRASH=SIGTERM/NEW)

All failures produced the same message:

[testname]: UPC error: Thread number in shared address is out of range

There was no difference between -g and -O in terms of which tests failed (though the -g run did have one test time-out).

PHHargrove commented 8 years ago

I have again tried the struct PTS representation on OpenBSD and this error is still present.