PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.08k stars 1.57k forks source link

Failed due to a TaskFailedToStart error: CannotPullContainerError #11637

Open yaronlevi opened 9 months ago

yaronlevi commented 9 months ago

First check

Bug summary

Got this error:

Failed due to a TaskFailedToStart error: CannotPullContainerError - failed to resolve ref docker.io/prefecthq/prefect:2.13.5-python3.10: failed to authorize: failed to fetch anonymous token: EOF
Flow run could not be submitted to infrastructure: TaskFailedToStart - CannotPullContainerError: pull image manifest has been retried 1 time(s): failed to resolve ref docker.io/prefecthq/prefect:2.13.5-python3.10: failed to authorize: failed to fetch anonymous token: Get "https://auth.docker.io/token?scope=repository%3Aprefecthq%2Fprefect%3Apull&service=registry.docker.io": EOF

CleanShot 2024-01-14 at 09 48 51@2x

Link to the run

Reproduction

Prefect cloud + ECS push pool.

Error

No response

Versions

Prefect cloud

Additional context

No response

phallur commented 3 months ago

Worker 'KubernetesWorker a59877db-07ff-4ba4-abe2-36c607de7349' submitting flow run 'fb8f9a8f-415b-4d46-b447-393e335479de' 12:06:16 PM prefect.flow_runs.worker Creating Kubernetes job... 12:06:16 PM prefect.flow_runs.worker Completed submission of flow run 'fb8f9a8f-415b-4d46-b447-393e335479de' 12:06:17 PM prefect.flow_runs.worker Job 'psi3-edasich-x-z58jm': Pod has status 'Pending'. 12:06:17 PM prefect.flow_runs.worker Job 'psi3-edasich-x-z58jm': Pod never started. 12:07:17 PM prefect.flow_runs.worker Job event 'SuccessfulCreate' at 2024-07-05 12:06:17+00:00: Created pod: psi3-edasich-x-z58jm-68rkt 12:07:17 PM prefect.flow_runs.worker Pod event 'Scheduled' at 2024-07-05 12:06:17.100731+00:00: Successfully assigned default/psi3-edasich-x-z58jm-68rkt to aks-soaap1z3514a-11953152-vmss000001 12:07:17 PM prefect.flow_runs.worker Pod event 'Pulling' (3 times) at 2024-07-05 12:06:57+00:00: Pulling image "sreg.azurecr.io/sentinel-flow:latest" 12:07:17 PM prefect.flow_runs.worker Pod event 'Failed' (3 times) at 2024-07-05 12:06:57+00:00: Failed to pull image "reg.azurecr.io/sentinel-flow:latest": failed to pull and unpack image "reg.azurecr.io/sentinel-flow:latest": failed to resolve reference "reg.azurecr.io/sentinel-flow:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://reg.azurecr.io/oauth2/token?scope=repository%3Asentinel-flow%3Apull&service=reg.azurecr.io: 401 Unauthorized 12:07:17 PM prefect.flow_runs.worker Pod event 'Failed' (3 times) at 2024-07-05 12:06:57+00:00: Error: ErrImagePull 12:07:17 PM prefect.flow_runs.worker Pod event 'BackOff' (3 times) at 2024-07-05 12:07:11+00:00: Back-off pulling image "reg.azurecr.io/sentinel-flow:latest" 12:07:17 PM prefect.flow_runs.worker Pod event 'Failed' (3 times) at 2024-07-05 12:07:11+00:00: Error: ImagePullBackOff 12:07:17 PM prefect.flow_runs.worker Reported flow run 'fb8f9a8f-415b-4d46-b447-393e335479de' as crashed: Flow run infrastructure exited with non-zero status code -1.

I'm getting a similar kind of issue in worker pool based Kubernetes deployment. Is this a known issue?