Open pplimport opened 8 years ago
Original author: Justin Miron Original date: 2016-11-03 14:52:22
This check may not be neccessary. If the PE was previously on a node that is now dead, then it should call DeliverUnknown as it may have been migrated. Though, this will trigger a deliver to the homePE, if the homePE is the dead processor then this will fail.
Check referred to: if((!CmiNodeAlive(destPE) && destPE != allowMessagesOnly)){ CkAbort("Cannot send to a chare on a dead node"); }
Original date: 2016-11-03 16:25:06
This was changed in the 64bit ID changes. Look at line 2635 here:
https://charm.cs.illinois.edu/gerrit/#/c/1217/ https://github.com/UIUC-PPL/charm/commit/71a0f8961609fd2bf40f62e1f337644f62734b7c5/src/ck-core/cklocation.C
Original author: Justin Miron Original date: 2016-11-03 17:31:24
Thanks, that helps a lot.
Reinserting the getNextPE code works when finding the next PE off of the evacuated destPE integer. Using the CkArrayIndices leads to problems as the CkArrayindex* passed in is sometimes NULL.
getNextPE previously used a hash of the CkArrayIndices, need an equivalent for the integers.
The CkAbort is now avoided, but proactive fault tolerance still hangs.
Original author: Justin Miron Original issue: https://charm.cs.illinois.edu/redmine/issues/1279
CkLocMgr::deliverMsg attempts to send messages to evacuated chares.
Fails this check: if((!CmiNodeAlive(destPE) && destPE != allowMessagesOnly)){ CkAbort("Cannot send to a chare on a dead node"); }
CmiNodeAlive checks if the valid processor bit is set for destPE. allowMessagesOnly should set the value msg->pe on every node during the ACK to the evacuation. This is set AFTER evacuation has occurred and the PE announces its evacuation.
allowMessagesOnly is set after valid processor bit is set to 0. If a message is attempted to be delivered between these two events, a failure could occur.
Investigating setting the allowMessagesOnly value first.