Open drodarie opened 1 month ago
Hmm, there might be a discrepancy between how Placement
/Conn
Job
s are serialized, and then how AfterConn
and AfterPlacement
because they were done sort of hurried afterwards. Since we have a lot of non-picklable objects, the serialization works as follows:
execute
handlerMPIPool
happens via the dispatcher
function.
execute
method retrievedexecute
method contains the logic needed to reconstruct the job from the arguments, eg, a PlacementJob
will take the scaffold.placement[node_name_from_dispatcher_args].place
method and run it.I suspect that either there is a After*
specific difference here, or, that it is due to:
cfg = Configuration.default(
storage={"engine": "hdf5", "root": "network.hdf5"},
after_connectivity = {"test_after_conn": TestAfterConn()}
)
the object in the conf, but I feel like we would have noticed this in other places then too.
PS: Since options.debug_pool
is on, shouldn't there be a lot more loggin before the error?
Produce the following stacktrace and then get stuck: