PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.6k stars 1.65k forks source link

Expose Vertex Dynamic Workload Scheduler on Vertex Run #15417

Open dwyatte opened 2 months ago

dwyatte commented 2 months ago

Describe the current behavior

Prefect's GCP Vertex integration exposes a subset of the arguments required to run a flow using a Vertex job. GCP recently integrated Vertex jobs with its Dynamic Workload Scheduler which allows users to pass an additional request parameter scheduling to allow more control over the start times of their jobs. This is especially useful for flows that require high-demand resources such as GPUs (e.g., by waiting up to 30 minutes for a GPU to become available from the flex start pool)

https://cloud.google.com/vertex-ai/docs/training/schedule-jobs-dws

workerPoolSpecs:
  machineSpec:
    machineType: n1-highmem-2
  replicaCount: 1
  containerSpec:
    imageUri: gcr.io/ucaip-test/ucaip-training-test
    args:
    - port=8500
    command:
    - start
scheduling:
  strategy: FLEX_START
  maxWaitDuration: 1800s

Describe the proposed behavior

https://github.com/PrefectHQ/prefect/blob/main/src/integrations/prefect-gcp/prefect_gcp/workers/vertex.py should expose the ability to specify the scheduling parameter

Example Use

No response

Additional context

Partial duplicate of https://github.com/PrefectHQ/prefect/issues/5495, we might consider just addressing that at the same time

zzstoatzz commented 2 months ago

hi @dwyatte - thank you for the issue!

increasing the capability of the vertex worker like this sounds useful and reasonable

do you have any interest / capacity to contribute this?

here are docs which might be a useful reference, but it would essentially just be adding a field to the config model that specifies the current work pool variables

dwyatte commented 2 months ago

Thanks @zzstoatzz

I or one of my colleagues plan to contribute soon!