PrefectHQ / prefect-demos

Sales Engineering demo repository
6 stars 0 forks source link

Infrastructure Failover Using Metric Triggers #74

Open masonmenges opened 6 months ago

masonmenges commented 6 months ago

Showcase resiliency patterns that can be configured in prefect utilizing metrics triggers, potentially composite triggers, and automations.

Want automated fail-over of scheduled work between two environments/workpools. Set up pair of scheduled deployments (A and B) with distinct tags, and deploy them to their own respective work pools (one per environment). Deployment A will have an active schedule, deployment B will be inactive, initially. Deployment A should intentionally be designed to fail 50-75% of the time. (Is there a cleaner way of doing this, could flow-run overrides be used here to force a flow to run on a different workpool)

Set up an automation which triggers in response to a metric monitoring the success rate of flows with a certain tag (should be the tag for Deployment A). If average success percentage falls below <50%, automation action should be to pause schedule for deployment A, enable schedule for deployment B and send notification to alert of failure. Maybe include composite trigger to ensure no successful flow run events have been emitted? Can also trigger an incident to manage/triage the failure in the failed deployment.

Optional- show multi-cloud - switch to a workpool on another cloud provider if get a metric or flow that fails - critical workflow switchover to another provider. Shows resiliency that Prefect can help provide.