There is a segfault in collection::Holder::foreach() when nextPhaseCollective() is called after a collection has been destroyed. This is an issue for applications that have collections that are no longer relevant for later phases.
To Reproduce
This can be observed several ways. One way is to add a few lines to examples/collection/lb_iter.cc. Within the loop over phases, right before the nextPhaseCollective() call, conditionally destroy the collection:
diff --git a/examples/collection/lb_iter.cc b/examples/collection/lb_iter.cc
index 6d467b034..53c2b137c 100644
--- a/examples/collection/lb_iter.cc
+++ b/examples/collection/lb_iter.cc
@@ -130,6 +130,10 @@ int main(int argc, char** argv) {
fmt::print("iteration: iter={},time={}\n", i, total_time);
}
+ if (i == num_iter-1) {
+ vt::theCollection()->destroy(proxy);
+ }
+
vt::thePhase()->nextPhaseCollective();
}
This happens whether or not the destroyed collection was used earlier in the phase in which it was destroyed. This can be seen even in a single-rank run.
We need to remove the proxy from CollectionManager::collect_lb_data_for_lb_ when the proxy is destroyed. Also, invokeCollectiveMsg should throw a sensible error if the proxy is not found.
Describe the bug
There is a segfault in
collection::Holder::foreach()
whennextPhaseCollective()
is called after a collection has been destroyed. This is an issue for applications that have collections that are no longer relevant for later phases.To Reproduce This can be observed several ways. One way is to add a few lines to
examples/collection/lb_iter.cc
. Within the loop over phases, right before thenextPhaseCollective()
call, conditionally destroy the collection:This happens whether or not the destroyed collection was used earlier in the phase in which it was destroyed. This can be seen even in a single-rank run.