charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
203 stars 49 forks source link

net-linux-x86_64-*-smp-pgcc crashes in megatest #234

Closed PhilMiller closed 7 years ago

PhilMiller commented 11 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/234


http://charm.cs.illinois.edu/autobuild/old.2013_06_05__03_33/net-linux-x86_64-ibverbs-smp-pgcc.txt The PGI compiler isn't happy with something we're doing. Our problem, or its?

Schedule-wise, 6.5.1 or 6.6, or never?

ericjbohm commented 5 years ago

Original date: 2013-06-17 21:00:55


I don't have an account on Trestles (machine has always been too small to matter for projects I work on) and neither Stampede nor Taub have PGI as far as I can tell. So I don't have access to an infiniband machine with PGI on it to work on this bug.

However, based on the rather dismal autobuild config log for trestles pgcc verbs smp: checking "whether asm eieio assembly works"... "no" checking "whether thread (Thread Local Storage) is supported"... "no" checking "whether synchronization primitives (sync_add_and_fetch) works in C"... "no" checking "whether synchronization primitives (__sync_synchronize) works in C"... "no" checking "whether fence intrinsic primitives (__builtin_Xfence_ia32) works in C"... "no" checking "whether switching TLS register (64-bit) is supported"... "yes"

SMP performance is probably going to be underwhelming even if we fixed it to work correctly under those conditions.

PhilMiller commented 5 years ago

Original date: 2013-06-17 21:25:28


You skipped the couple lines before it, using inline assembly

checking "whether GCC x86 assembly works"... "yes"
checking "whether GCC x86 assembly for atomic increment works"... "yes"

The performance of those should be just about the same as the compiler intrinsics.

ericjbohm commented 5 years ago

Original date: 2013-06-17 21:38:41


Yes, but I am suspicious of the use of those being correct in PGI.

PhilMiller commented 5 years ago

Original date: 2013-06-17 21:56:37


How about TACC's Lonestar?

ericjbohm commented 5 years ago

Original date: 2013-06-19 22:52:39


{I don't have a lonestar account, but I was able to investigate trestles via buildcharm}

The bug appears to be unrelated to ibverbs. (subject line changed accordingly) I can produce the same problem in net-linux-x86_64-smp-pgcc. It persists across -memory [os|ptmalloc|gnu] and -thread [generic|context|uJcontext]. GCC is not afflicted with this problem.

On a related note, Trestles has a rather antiquated (May 2010) version of pgcc 10.5-0. However, trying a newer PGI installation elsewhere (such as on BlueWaters) just moves the problem from runtime to compile time. As PGI 13.3-0 crashes during compilation of ckarray.C with:

PGCC-S-0000-Internal compiler error. union_find_last_lp_per_handler:empty throw_bih 0 (ckarray.C: 635)

This exhausts my patience with PGI at this time.

Recommended workaround: compile using ICC, or GCC. Unless you're a masochist, in which case feel free to try compiling with PGI while standing on broken glass.

PhilMiller commented 5 years ago

Original date: 2013-06-19 23:26:08


An important note on that finding: the ICE requires that optimization of -O2 or -O3 be enabled - it works with -O1. I've also seen that it occurs with both 13.3 and 13.4.

I was about to test 12.x, when the login nodes suddenly hung. I'll take another look later.

PhilMiller commented 5 years ago

Original date: 2013-06-19 23:51:35


I just tried 12.8 and 12.10 - ICE is not present.

ericjbohm commented 5 years ago

Original date: 2013-06-20 17:13:10


Reported to NCSA Jira. They noted the similarity with: http://www.pgroup.com/userforum/viewtopic.php?t=3885

So, it might be fixed in 13.6.0, which might see the light of day on a system we have access to eventually.

Shifting target to 6.6.0, since we're unlikely to have a solution to the original issue on Trestles any time soon (perhaps never if they don't upgrade PGI).

ericjbohm commented 5 years ago

Original date: 2013-08-28 18:46:32


PGCC should not be tested on the old version available on Trestles. Target will shift to Hopper when we have smp debugged.

ericjbohm commented 5 years ago

Original date: 2013-10-02 19:44:07


Given that the releases over the past few years either fail to compile, fail to link, or generate code that segfaults, I feel that we shouldn't regard PGI as a production compiler for Charm++. It is not worth our time unless one of our collaborators really needs PGI for some reason.

stwhite91 commented 5 years ago

Original date: 2017-02-01 18:40:31


closing due to net- being deprecated and pgcc not generally being able to compile Charm (and the lack of requests from users for PGI support)