Open benclifford opened 6 years ago
Here's what is going on here: Information on the state of blocks before they have connected is only known to the provider. In the initial cut of the HighThroughputExecutor
as well as the ExtremeScaleExecutor
we don't support scaling strategies that monitor the state of launched blocks.
Besides this specific situation, we don't have a good way of dealing with these failures in general. For instance if we see a failure we just launch more blocks, because the provider is not smart enough to determine the cause of the block failure.
crossreg #1035
Using
parsl/tests/configs/htex_local.py
, a failingprocess_worker_pool.py
(for example, not on path, or failing to start up) results in a silent hang, rather than any diagnostic information or workflow exit.Similarly with
parsl/tests/configs/exex_local.py
and failingmpi_worker_pool.py
.