Open efranksrecroom opened 1 year ago
@efranksrecroom can you please create a reproduction? You can simulate a crashed subflow run by having it raise a base exception.
This seems to roughly repro:
class GoBoom:
def generate_subflow(self, logger):
@flow(name='Raise BaseException', retries=3, retry_delay_seconds=10)
def sub_flow(logger):
logger.info("Starting")
time.sleep(30)
raise BaseException()
sub_flow(logger)
@flow(timeout_seconds=30, retries=1, retry_delay_seconds=25)
def foo_bar():
go_boom = GoBoom()
logger = get_run_logger()
go_boom.generate_subflow(logger)
if __name__ == "__main__":
foo_bar()
The net result of running this is a crashed subflow w/ a running flow that has exceeded the timeout that I specified.
Thank you! We'll investigate this :)
@zanieb did you ever get to the bottom of this? We're facing this issue in Prefect 3 still.
First check
Bug summary
We have a flow that fires a subflow to run a sync with FiveTran. This flow has a timeout of 55 minutes defined. What we've found is that if the subflow ends up in a Crashed state, the parent flow will hit this timeout but never be marked as TimedOut; it will stay marked as "Running" with no ability to cancel it. More importantly, it never shows up as Failed/Crashed/Late and thus will not trigger an alert.
Note in the image that the Start Time for this flow was 12:57:42 AM and this screenshot was taken at ~11:00 AM the same day. The "Duration" of the job is 55m 1s which is roughly the 55m timeout that we have set for this flow. As you can see, the flow is still in a "Running" state.
Now observe that the subflow is listed as "Crashed".
Reproduction
Error
Versions
Additional context
No response