charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
207 stars 50 forks source link

Some CkCallback types are not valid across checkpoint/restart #159

Open PhilMiller opened 11 years ago

PhilMiller commented 11 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/159


Per #158, many types of CkCallback contain (possibly transitively, through other structures) raw pointers to objects in the system, like chares (via CkChareID) and functions. These callbacks cannot survive recovery from the kind of application-level checkpoints that Charm++ performs, because their targets may have changed in address from one execution to the next. In the chare case, we can potentially use a less transient identifier like chareIdx if that's stable and usable across restart. If chares get folded into the fixed-size global object ID work (#108), then that will apply to callbacks as well, and this will be fixed.

I'm less sure how to handle functions. It might be possible to have them registered explicitly and referenced by some ID instead of by pointer, but I'm uncertain whether that would actually work in the restart case either, unless the registration were in some very low-level code run at every process launch. If initnode calls happen even during restart, then that may suffice, but whoever works on this would have to check this pretty carefully.

ericjbohm commented 5 years ago

Original date: 2013-08-28 19:12:38


Parcel out sub components of this task as needed.

PhilMiller commented 5 years ago

Original date: 2017-04-04 20:41:04


Not seeming to affect any current applications, so deferring.

PhilMiller commented 5 years ago

Original date: 2017-12-06 17:56:04


Eric, Ronak: what's the status of using 64-bit IDs to name plain chares?

epmikida commented 5 years ago

Original date: 2017-12-06 21:02:29


I did some exploration to get this integrated, and to get singleton chares ID fully updated to 64bit ID would take a lot of work due to the number of different chare IDs already used in various different places and the fact that they aren't even always used as just pure IDs. A quick and dirty fix to get 64bit IDs for every singleton chare is more doable, but I'm not sure how worthwhile it would be, and may necessitate multiple API changes to add another ID to a chare as a temporary fix, and then later re-update the API to remove the other obsolete IDs.

If this particular bug is critical, the plain IDs could be added quickly to (maybe) address this if it is worth it.

Ronak may have more input?

juanjgalvez commented 5 years ago

Original date: 2019-04-04 19:36:13


Regarding callbacks that use function pointers, we concluded in core meeting that we will disallow this use of callbacks with checkpoint/restart. Instead, user should use entry method callback.

evan-charmworks commented 4 years ago

Unscheduling because ASLR workarounds are in place and the documentation describes which kinds of callbacks are suitable in what circumstances.