apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.39k stars 14.11k forks source link

Forbidden: pods is forbidden while using a service account in GKE #37795

Open dmndru opened 6 months ago

dmndru commented 6 months ago

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes 8.0.0 apache-airflow-providers-google 10.15.0

Apache Airflow version

2.7.3

Operating System

Debian 11

Deployment

Official Apache Airflow Helm Chart

Deployment details

GKE cluster version 1.26.13

What happened

We are using the GKEStartPodOperator to run a pod in our GKE clusters and getting the error:

kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '677bd23e-3885-4057-84cb-cbcfd8bcb4d2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '43677b2c-e54a-4a50-ae9f-79d579f5d98c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9ac4bb3e-9b09-4aca-912b-6b01d3f002b1', 'Date': 'Tue, 27 Feb 2024 10:48:33 GMT', 'Content-Length': '337'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"7843619910037672257843\" cannot create resource \"pods\" in API group \"\" in the namespace \"staging\": requires one of [\"container.pods.create\"] permission(s).","reason":"Forbidden","details":{"kind":"pods"},"code":403}

trace [2024-02-27, 10:48:33 UTC] {pod_manager.py:329} ERROR - Exception when attempting to create Namespaced Pod: { "apiVersion": "v1", "kind": "Pod", "metadata": { "annotations": {}, "labels": { "tier": "staging", "dag_id": "batch", "task_id": "start", "run_id": "manual__2024-02-27T104509.7089670000-57e30e5c0", "kubernetes_pod_operator": "True", "try_number": "1", "airflow_version": "2.7.3", "airflow_kpo_in_cluster": "False" }, "name": "batch-9ef11196", "namespace": "staging" }, "spec": { "affinity": {}, "containers": [ { "args": [ "--name", "batch", "--batch_pg_run_id", "manual__2024-02-27T10:45:09.708967+00:00", "--batch_pg", "input/", "--batch_pg_use_proxy", "True", "--batch_pg_dag_execution_date", "2024-02-27", "--batch_pg_always_use_selenium", "False", "--batch_pg_selenium_on_scrapy_error", "False", "--batch_pg_batch_size", "1000", "--batch_pg", "False" ], "command": [], "env": [], "envFrom": [], "image": "staging_latest", "imagePullPolicy": "Always", "name": "base", "ports": [], "terminationMessagePolicy": "File", "volumeMounts": [] } ], "hostNetwork": false, "initContainers": [], "restartPolicy": "Never", "securityContext": {}, "volumes": [] } } Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 324, in run_pod_async resp = self._client.create_namespaced_pod( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) # noqa: E501 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 7455, in create_namespaced_pod_with_http_info return self.api_client.call_api( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( ^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 391, in request return self.rest_client.POST(url, ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 279, in POST return self.request("POST", url, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 238, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '677bd23e-3885-4057-84cb-cbcfd8bcb4d2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '43677b2c-e54a-4a50-ae9f-79d579f5d98c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9ac4bb3e-9b09-4aca-912b-6b01d3f002b1', 'Date': 'Tue, 27 Feb 2024 10:48:33 GMT', 'Content-Length': '337'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"7843619910037672257843\" cannot create resource \"pods\" in API group \"\" in the namespace \"staging\": requires one of [\"container.pods.create\"] permission(s).","reason":"Forbidden","details":{"kind":"pods"},"code":403} [2024-02-27, 10:48:33 UTC] {pod.py:1109} ERROR - 'NoneType' object has no attribute 'metadata' Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 578, in execute_sync self.pod = self.get_or_create_pod( # must set `self.pod` for `on_kill` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 538, in get_or_create_pod self.pod_manager.create_pod(pod=pod_request_obj) File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) ^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 314, in iter return fut.result() ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 354, in create_pod return self.run_pod_async(pod) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 332, in run_pod_async raise e File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 324, in run_pod_async resp = self._client.create_namespaced_pod( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) # noqa: E501 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 7455, in create_namespaced_pod_with_http_info return self.api_client.call_api( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( ^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 391, in request return self.rest_client.POST(url, ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 279, in POST return self.request("POST", url, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 238, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '677bd23e-3885-4057-84cb-cbcfd8bcb4d2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '43677b2c-e54a-4a50-ae9f-79d579f5d98c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9ac4bb3e-9b09-4aca-912b-6b01d3f002b1', 'Date': 'Tue, 27 Feb 2024 10:48:33 GMT', 'Content-Length': '337'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"7843619910037672257843\" cannot create resource \"pods\" in API group \"\" in the namespace \"staging\": requires one of [\"container.pods.create\"] permission(s).","reason":"Forbidden","details":{"kind":"pods"},"code":403} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 937, in patch_already_checked name=pod.metadata.name, ^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'metadata' [2024-02-27, 10:48:33 UTC] {taskinstance.py:1937} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 578, in execute_sync self.pod = self.get_or_create_pod( # must set `self.pod` for `on_kill` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 538, in get_or_create_pod self.pod_manager.create_pod(pod=pod_request_obj) File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) ^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 314, in iter return fut.result() ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 354, in create_pod return self.run_pod_async(pod) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 332, in run_pod_async raise e File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 324, in run_pod_async resp = self._client.create_namespaced_pod( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) # noqa: E501 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py", line 7455, in create_namespaced_pod_with_http_info return self.api_client.call_api( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( ^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 391, in request return self.rest_client.POST(url, ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 279, in POST return self.request("POST", url, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", line 238, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '677bd23e-3885-4057-84cb-cbcfd8bcb4d2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '43677b2c-e54a-4a50-ae9f-79d579f5d98c', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9ac4bb3e-9b09-4aca-912b-6b01d3f002b1', 'Date': 'Tue, 27 Feb 2024 10:48:33 GMT', 'Content-Length': '337'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"7843619910037672257843\" cannot create resource \"pods\" in API group \"\" in the namespace \"staging\": requires one of [\"container.pods.create\"] permission(s).","reason":"Forbidden","details":{"kind":"pods"},"code":403} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/google/cloud/operators/kubernetes_engine.py", line 548, in execute return super().execute(context) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 570, in execute return self.execute_sync(context) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 629, in execute_sync self.cleanup( File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 839, in cleanup raise AirflowException( airflow.exceptions.AirflowException: Pod batch-pagegrabber-crawler-9ef11196 returned a failure. remote_pod: None [2024-02-27, 10:48:33 UTC] {taskinstance.py:1400} INFO - Marking task as FAILED. dag_id=batch, task_id=start, execution_date=20240227T104509, start_date=20240227T104831, end_date=20240227T104833 [2024-02-27, 10:48:33 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 19 for task start (Pod batch-9ef11196 returned a failure. remote_pod: None; 24)

What you think should happen instead

No response

How to reproduce

  1. create a GKE cluster
  2. install Airflow 2.7.3 into the cluster
  3. create a namespace in the cluster
  4. create a GCP service account and grant the Kubernetes Engine Viewer role to the service account
  5. grant permission to create pods in the namespace to the service account by creating role and rolebinding
  6. run a pod on the cluster using the GKEStartPodOperator:
    pod_batch = GKEStartPodOperator(
    task_id="start_batch",
    gcp_conn_id="my_gcp_conn",
    name="batch",
    image="myimage",
    image_pull_policy="Always",
    startup_timeout_seconds=1800,
    is_delete_operator_pod=True,
    project_id="cluster1_project_id",
    cluster_name="cluster1",
    location=GCP_LOCATION,
    namespace="staging",
    labels={"tier": "staging"},
    )

Anything else

The error could be fixed by granting the Kubernetes Engine Developer to the service account, but it is cluster-wide, and we need to grant permissions to a single namespace.

Are you willing to submit PR?

Code of Conduct

SamWheating commented 6 months ago

Since you're running within the cluster already, could you just use a regular KubernetesPodOperator?

I'm not super familiar with the GKE-specific operators, but I believe they go through the public-internet GKE APIs rather than just talking to the cluster-local API server, hence the additional IAM role requirement.

dmndru commented 6 months ago

Yes, the GKEStartPodOperator goes through the public endpoint. Unfortunately, we can't use the KubernetesPodOperator since we also need to create pods in other clusters.

avkudryashov commented 1 month ago

Fixed by adding the https://www.googleapis.com/auth/userinfo.email scope.

{
  "conn_type": "google_cloud_platform",
  "extra": {
    "extra__google_cloud_platform__scope": "https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email"
  }
}