Open PHHargrove opened 5 years ago
Thanks for taking it this far Paul!
Thinking about this a bit over the holidays, I found myself wondering if the following would represent a workaround for users who might care about this configuration: If there are other C++ compilers that can target PPC, it'd be interesting to set CHPL_HOST_COMPILER to that compiler (gnu
, say, if g++ did), and CHPL_TARGET_COMPILER to ibm
(note that printchplenv doesn't print out the host information by default, but will if given the --all
flag).
None of this is to say that it wouldn't still be worthwhile to understand the cause of this behavior with XL. Just that using a different host compiler might be a more attractive workaround than building with debugging/developer settings on. (even on Crays, we tend to build the compiler with g++ for simplicity and only worry about specialized compilers on the target end).
@bradcray As you have inferred from this issue and my question in #10450, I am very interested in ensuring we run well on Summit. Since GNU and LLVM compiler families are also available (and supported), I suspect the work-around of distinct host and target compilers should be sufficient for any users who need IBM or PGI for their codes.
For my testing of GASNet-EX, the runtime build is the part that matters. So, I may (longer term) consider split HOST/TARGET compilers as a normal part of my testing. For now, I'll plan to test your suggestion for just the IBM case, and will report back when I have.
and will report back when I have.
Sounds good, thanks!
We ought to be using the TARGET compiler for any code that's actually part of the Chapel-generated executable (i.e., the runtime, third party libraries including GASNet, the compiler-generated code), so I'd expect using a different host compiler not to impact the part that matters (not to say that compile times don't matter, but...).
and will report back when I have.
Sounds good, thanks!
Our GASNet-EX + Chapel CI worked almost perfectly with CHPL_HOST_COMPILER=gnu CHPL_TARGET_COMPILER=ibm
.
I say "almost" only because compilation timed out after 20 minutes (imposed by our CI system) on ra
and ra-atomics
tests. However, that appears to be time spent in the IBM compiler (verified via top
).
There are a few signs of "bit rot", such as numerous warnings, including some about ignoring -qhalt=i
(rough equivalent to -Werror
, ironically enough). Some of the warnings are from the back-end compile, such as from jemalloc.h
:
In file included from /tmp/chpl-hargrove-141035.deleteme/_main.c:2:
In file included from /tmp/chpl-hargrove-141035.deleteme/chpl__header.h:6:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/stdchpl.h:48:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/chpl-file-utils.h:24:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/qio/qio.h:25:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/qio/qbuffer.h:39:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/qio/deque.h:43:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/chpl-mem.h:58:
In file included from /gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/runtime/include/mem/jemalloc/chpl-mem-impl.h:26:
/gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/third-party/jemalloc/install/linux_ppc_le64-ppc64le-none-ibm-none/include/jemalloc/jemalloc.h:198:46: warning: 1540-2990 The attribute " __attribute__((alloc_size(1, 0)))" is not supported. The attribute is ignored.
JEMALLOC_CXX_THROW JEMALLOC_ATTR(malloc) JEMALLOC_ALLOC_SIZE(1);
^
/gpfs/alpine/csc296/scratch/hargrove/upcnightly-summitdev/EX-summitdev-ibv-xlc/runtime/work/dbg/chapel/third-party/jemalloc/install/linux_ppc_le64-ppc64le-none-ibm-none/include/jemalloc/jemalloc.h:210:24: warning: 1540-2990 The attribute " __attribute__((alloc_size(2, 0)))" is not supported. The attribute is ignored.
JEMALLOC_CXX_THROW JEMALLOC_ALLOC_SIZE(2);
^
2 warnings generated.
So, CHPL_TARGET_COMPILER=ibm
is not looking "good", IMO, but is far from a lost cause if you chose to pursue it, for instance for Summit.
I am happy to ignore this compiler until/unless the Chapel team makes a clear statement that it matters to them. I am going to see if I can poke CHPL_TARGET_COMPILER=pgi
on POWER next, and will update #10450 when I do.
While I hope we support the IBM compiler well again someday before long, I can't say that it matters enough to me at this point for you to spend more time on it. I think it would be fine to wait until we make the effort to clean up our IBM support / builds to run testing if you want. I'm encouraged that things work modulo long compile times and warnings, thanks for letting us know!
I am testing Wednesday's master (c69226e) with IBM XL compilers on a little-endian PPC64 system.
I made a 1-line change in order to complete the build:
However, when I use the resulting
chpl
, it blows up on every test I've tried:While not directly related to the current problem, I did notice the text "the filename + line number above" seems out-of-place since no such info appears above.
Interestingly, another build, with
CHPL_COMM_DEBUG=1 CHPL_DEVELOPER=1
, does not crash.CHPL env:
Back-end: