Closed lifflander closed 3 years ago
This initial problem is related to FMT_CONSTEXPR
.
By manually removing the FMT_CONSTEXPR
, the next error is:
/lscratch1/jliffla/vt-clean/vt-auto-build/vt-release-test-message/vt/vt/src/vt/trace/trace_user_event.h(126): error: no default constructor exists for class "std::unordered_map<vt::trace::UserEventIDType={int64_t={long}}, std::string, std::hash<long>, std::equal_to<vt::trace::UserEventIDType={int64_t={long}}>, std::allocator<std::pair<const vt::trace::UserEventIDType={int64_t={long}}, std::string>>>"
std::unordered_map<UserEventIDType, std::string> user_event_ = {};
^
I removed many of the initializers for tons of unordered_maps in the code, which seems to trigger a bug. But now, I'm stuck on an internal compiler error:
[ 1%] Building CXX object src/CMakeFiles/vt.dir/vt/group/collective/group_info_collective.cc.o
": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.
For reference, looks like /=
operator was causing an internal compiler error for Kokkos:
https://github.com/kokkos/kokkos-kernels/issues/607
Maybe we are doing something similar?
There are a few /=
in fmt
that might be culprits here?
That could be it? Not sure how to determine for sure what the ICE is from.
I've created containers with 3 Intel compiler versions for testing: https://hub.docker.com/repository/docker/lifflander1/icc
I tried to build develop on Intel 19.1.1, the latest version---which we haven't tried so far (AFAIK). I'm still getting ICE and some template failures in the newer RDMA handers code (class specialization on NodeType vs Index).
One ICE is from this lines of code in collective_scope.cc:
r->reduce<collective::None>(collective_root, msg.get(), cb, stamp);
When I comment it out, the file compiles.
In collective_scope.cc
, I can narrow it down to this line in reduceImmediate
:
auto const han = auto_registry::makeAutoHandler<MsgT,f>();
In makeAutoHandler
, replacing RunType::idx
with a literal constant 1
compiles collective_scope.cc
Way down in registerActiveGen
, replacing the return statement with return 1;
makes that happy.
With RegistrarGen<RunnableT, RegT, InfoT, FnT>::RegistrarGen()
, apparently
FnT fn = reinterpret_cast<FnT>(AdapterType::getFunction());
is unhappy
Just calling AdapterType::getFunction()
is unhappy, so it's not the reinterpret_cast
that's the problem
Blocks #895
On ascic170, I just found (kinda by accident) that icpc (ICC) 19.0.2.187 20190117
reports various compilation errors, while icpc (ICC) 19.1.2.254 20200623
segfaults. So, there's some room for bisection here!
icpc (ICC) 19.0.5.281 20190815
reports an error rather than crashing, too
When I fix the putative error (actually, it was a compiler bug - default value of 0
for a pointer parameter to a template was not being interpreted as nullptr
), that still crashes.
The full range of 19.x compilers crash with that change made.
Here's the fully reduced test case, courtesy of CReduce and follow-up reduction and analysis by hand:
using ActiveTypedFnType = void();
template < typename MsgT >
void basicHandler();
// *UN*commenting this makes the compiler run successfully;
// is the issue that instantiation is triggered by the template default argument?
//auto fn = &basicHandler<int>;
template <
typename MsgT,
ActiveTypedFnType f =
basicHandler< MsgT>
>
void reduce( )
{
auto foo = f;
}
void __trans_tmp_2() {
// Or uncommenting this
//auto g = &basicHandler<int>;
reduce<
int
// Or uncommenting this
//, basicHandler<int>
>();
}
And, confirmed via godbolt.org that it crashes icc 21.1.9 as well!
There should then be a reasonable workaround for this, by interposing a function or typedef that has the instantiation in-line
I've got the workaround implemented in reduce.h, and it seems to be accepted. There are a couple other spots likely affected. PR should be posted shortly.
Describe the bug