chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

Internal error from chpl when using IBM compilers #11931

Open PHHargrove opened 5 years ago

PHHargrove commented 5 years ago

I am testing Wednesday's master (c69226e) with IBM XL compilers on a little-endian PPC64 system.
I made a 1-line change in order to complete the build:

--- a/make/compiler/Makefile.ibm
+++ b/make/compiler/Makefile.ibm
@@ -58,7 +58,7 @@ LIB_DYNAMIC_FLAG = -qmkshrobj -X64
 # TODO: Set the target architecture for optimization (e.g. -qtune)
 # TODO: Set flag for lax or IEEE floating point (e.g. -qfloat)

-CXX11_STD := unknown
+CXX11_STD := -std=c++11

 #
 # Flags for turning on warnings for C++/C code

However, when I use the resulting chpl, it blows up on every test I've tried:

chpl -o hello6-taskpar-dist hello6-taskpar-dist.chpl
internal error: UTI-MIS-0597 chpl version 1.19.0
Internal errors indicate a bug in the Chapel compiler ("It's us, not you"),
and we're sorry for the hassle.  We would appreciate your reporting this bug --
please see https://chapel-lang.org/bugs.html for instructions.  In the meantime,
the filename + line number above may be useful in working around the issue.

While not directly related to the current problem, I did notice the text "the filename + line number above" seems out-of-place since no such info appears above.

Interestingly, another build, with CHPL_COMM_DEBUG=1 CHPL_DEVELOPER=1, does not crash.

CHPL env:

CHPL_TARGET_PLATFORM: linux_ppc_le64
CHPL_TARGET_COMPILER: ibm
CHPL_TARGET_MACHINE: ppc64le
CHPL_TARGET_ARCH: none
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet +
  CHPL_COMM_SUBSTRATE: smp +
  CHPL_GASNET_SEGMENT: fast
CHPL_TASKS: qthreads
CHPL_LAUNCHER: smp
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: locks
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: none
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_AUX_FILESYS: none

Back-end:

$ xlc -qversion
IBM XL C/C++ for Linux, V16.1.0 (Community Edition)
Version: 16.01.0000.0000
bradcray commented 5 years ago

Thanks for taking it this far Paul!

bradcray commented 5 years ago

Thinking about this a bit over the holidays, I found myself wondering if the following would represent a workaround for users who might care about this configuration: If there are other C++ compilers that can target PPC, it'd be interesting to set CHPL_HOST_COMPILER to that compiler (gnu, say, if g++ did), and CHPL_TARGET_COMPILER to ibm (note that printchplenv doesn't print out the host information by default, but will if given the --all flag).

None of this is to say that it wouldn't still be worthwhile to understand the cause of this behavior with XL. Just that using a different host compiler might be a more attractive workaround than building with debugging/developer settings on. (even on Crays, we tend to build the compiler with g++ for simplicity and only worry about specialized compilers on the target end).

PHHargrove commented 5 years ago

@bradcray As you have inferred from this issue and my question in #10450, I am very interested in ensuring we run well on Summit. Since GNU and LLVM compiler families are also available (and supported), I suspect the work-around of distinct host and target compilers should be sufficient for any users who need IBM or PGI for their codes.

For my testing of GASNet-EX, the runtime build is the part that matters. So, I may (longer term) consider split HOST/TARGET compilers as a normal part of my testing. For now, I'll plan to test your suggestion for just the IBM case, and will report back when I have.

bradcray commented 5 years ago

and will report back when I have.

Sounds good, thanks!

We ought to be using the TARGET compiler for any code that's actually part of the Chapel-generated executable (i.e., the runtime, third party libraries including GASNet, the compiler-generated code), so I'd expect using a different host compiler not to impact the part that matters (not to say that compile times don't matter, but...).

PHHargrove commented 5 years ago

and will report back when I have.

Sounds good, thanks!

Our GASNet-EX + Chapel CI worked almost perfectly with CHPL_HOST_COMPILER=gnu CHPL_TARGET_COMPILER=ibm.

I say "almost" only because compilation timed out after 20 minutes (imposed by our CI system) on ra and ra-atomics tests. However, that appears to be time spent in the IBM compiler (verified via top).

There are a few signs of "bit rot", such as numerous warnings, including some about ignoring -qhalt=i (rough equivalent to -Werror, ironically enough). Some of the warnings are from the back-end compile, such as from jemalloc.h:

In file included from /tmp/chpl-hargrove-141035.deleteme/_main.c:2:
In file included from /tmp/chpl-hargrove-141035.deleteme/chpl__header.h:6:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/stdchpl.h:48:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/chpl-file-utils.h:24:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/qio/qio.h:25:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/qio/qbuffer.h:39:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/qio/deque.h:43:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/chpl-mem.h:58:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/mem/jemalloc/chpl-mem-impl.h:26:
/gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/third-party/jemalloc/install/linux_ppc_le64-ppc64le-none-ibm-none/include/jemalloc/jemalloc.h:198:46: warning: 1540-2990 The attribute " __attribute__((alloc_size(1, 0)))" is not supported.  The attribute is ignored.
    JEMALLOC_CXX_THROW JEMALLOC_ATTR(malloc) JEMALLOC_ALLOC_SIZE(1);
                                             ^
/gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/third-party/jemalloc/install/linux_ppc_le64-ppc64le-none-ibm-none/include/jemalloc/jemalloc.h:210:24: warning: 1540-2990 The attribute " __attribute__((alloc_size(2, 0)))" is not supported.  The attribute is ignored.
    JEMALLOC_CXX_THROW JEMALLOC_ALLOC_SIZE(2);
                       ^
2 warnings generated.

So, CHPL_TARGET_COMPILER=ibm is not looking "good", IMO, but is far from a lost cause if you chose to pursue it, for instance for Summit.

I am happy to ignore this compiler until/unless the Chapel team makes a clear statement that it matters to them. I am going to see if I can poke CHPL_TARGET_COMPILER=pgi on POWER next, and will update #10450 when I do.

bradcray commented 5 years ago

While I hope we support the IBM compiler well again someday before long, I can't say that it matters enough to me at this point for you to spend more time on it. I think it would be fine to wait until we make the effort to clean up our IBM support / builds to run testing if you want. I'm encouraged that things work modulo long compile times and warnings, thanks for letting us know!