PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.82k stars 1.55k forks source link

Google Cloud Run Guide fails with error for CloudRunWorkerJobConfiguration #15131

Open feliperyan opened 2 weeks ago

feliperyan commented 2 weeks ago

Bug summary

Running through https://prefecthq.github.io/prefect-gcp/gcp-worker-guide/#google-cloud-run-guide I get to end and run the deployment. Flow Run indicates crashes with the logs:

Failed to submit flow run '27f2ed1c-75ab-46a1-8f95-cd643d322e17' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 906, in _submit_run_and_capture_errors
    configuration = await self._get_configuration(flow_run)
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 991, in _get_configuration
    configuration = await self.job_configuration.from_template_and_values(
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 100, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/workers/base.py", line 149, in from_template_and_values
    return cls(**populated_configuration)
  File "/usr/local/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CloudRunWorkerJobConfiguration
job_body
  Job is missing required attributes at the following paths: /apiVersion, /kind, /metadata, /spec (type=value_error)

Version info (prefect version output)

# Keep in mind I'm self hosting a server via K8s + Helm and this is the output from the client side.

Version:             2.20.3
API version:         0.8.4
Python version:      3.10.14
Git commit:          b8c27aa0
Built:               Thu, Aug 22, 2024 3:13 PM
OS/Arch:             linux/x86_64
Profile:             gke
Server type:         server

Additional context

No response

zzstoatzz commented 2 weeks ago

hi @feliperyan - thanks for the report!

are you able to share the version of prefect-gcp you've installed on your cluster with the helm chart? or at least, did you just use --install-policy always in the prefect worker start command?

feliperyan commented 2 weeks ago

Hi @zzstoatzz thanks for getting back to me. I don't believe I installed prefect-gcp on the cluster as the helm chart doesn't call it out: https://github.com/PrefectHQ/prefect-helm/blob/main/charts/prefect-server/values.yaml

I followed https://prefecthq.github.io/prefect-gcp/gcp-worker-guide/#step-3-deploy-a-cloud-run-worker which calls out --install-policy always as you can see from the copy paste below

gcloud run deploy prefect-worker --image=prefecthq/prefect:2-latest \
--set-env-vars PREFECT_API_URL=$PREFECT_API_URL,PREFECT_API_KEY=$PREFECT_API_KEY \
--service-account <YOUR-SERVICE-ACCOUNT-NAME> \
--no-cpu-throttling \
--min-instances 1 \
--args "prefect","worker","start","--install-policy","always","--with-healthcheck","-p","<WORK-POOL-NAME>","-t","cloud-run"
feliperyan commented 1 week ago

Hi @zzstoatzz I was able to get pass this issue by creating a custom image used in the helm chart I referenced before.

server:
  image:
    # -- server image repository
    repository: <my_own_image_in_gcp_artifact_registry>
    ## prefect tag is pinned to the latest available image tag at packaging time.  Update the value here to
    ## override pinned tag
    # -- prefect image tag (immutable tags are recommended)
    prefectTag: <relevant_tag>

My image is nothing more than the below as per https://prefecthq.github.io/prefect-gcp/#build-an-image

FROM prefecthq/prefect:2-python3.11
RUN pip install "prefect-gcp[cloud_storage]"