PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.51k stars 1.64k forks source link

Canceling flow does not stop Vertex AI custom training job #13056

Open jeremy-thomas-roc opened 1 year ago

jeremy-thomas-roc commented 1 year ago

As the title states, when a flow is manually canceled, say through the UI, the training job persists in Vertex. This requires the user to go manually cancel the training job in Vertex, or it stays running indefinitely.

Expectation / Proposal

Canceling a flow should cancel the training job

Traceback / Example

Not sure how to provide an example, but all of our jobs run using this infrastructure, and it occurs on all of them, so I am confident this is an error within the infrastructure block and not anywhere else in our workflow.

I'd be happy to help, but I'm not sure I have the technical expertise to dive into the Prefect cancelation system and workflow. If it is a bug contained to this repo, I may be able to figure it out, but I may need a point in the right direction.

acgourley commented 1 month ago

A related issue perhaps is that when my vertex jobs have ended (crashed, or I ended them) prefect won't see that, and have them in the running state. If I then cancel them, it stays in "Canceling" mode forever. This behavior seems worse in prefect 3 than it was in 2 I believe.