aws-samples / eks-spark-benchmark

Performance optimization for Spark running on Kubernetes
Apache License 2.0
85 stars 28 forks source link

Custom IAM Role for Driver and Executor Pods? #10

Open batCoder95 opened 4 years ago

batCoder95 commented 4 years ago

Hi all,

I wanted to check if it is possible to define an AWS IAM role that should be attached to driver and worker pods in Spark application YAML file. As per my understanding, currently these pods inherit the role from EKS node-group, but I would like to specify my own custom role in the YAML file. Can somebody please suggest on the possibility ?

I thought of passing IAM Role / OIDC Service Account in below manner but did not work. It fails saying S3 access denied (403 error).

apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: test-pyspark-app namespace: dev spec: type: Python pythonVersion: "3" mode: cluster image: "coqueirotree/spark-py" imagePullPolicy: Always mainApplicationFile: s3a://my-bucket/TestFile.py sparkVersion: "3.0.0" sparkConf: "spark.kubernetes.driverEnv.serviceAccount": "svc-spark-account" "spark.kubernetes.driverEnv.serviceAccountName": "svc-spark-account" "spark.kubernetes.authenticate.driver.serviceAccount": "svc-spark-account" "spark.kubernetes.authenticate.driver.serviceAccountName": "svc-spark-account" "spark.kubernetes.executorEnv.serviceAccount": "svc-spark-account" "spark.kubernetes.executorEnv.serviceAccountName": "svc-spark-account" "spark.kubernetes.authenticate.executor.serviceAccount": "svc-spark-account" "spark.kubernetes.authenticate.executor.serviceAccountName": "svc-spark-account" driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.0.0 serviceAccount: svc-spark-account serviceAccountName: svc-spark-account executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.0.0 serviceAccount: svc-spark-account serviceAccountName: svc-spark-account

Can someone please advise if I should do something differently in this case? Thanks in advance :)

Jeffwan commented 4 years ago

IRSA requires AWS SDK support with assume-web-identity-role. I have not check dependencies. If you have time to help, feel free to take a look. Additionaly, S3 is a little bit different, because spark use hadoop-s3 package. You'd better check mapping and find the version.

Jeffwan commented 4 years ago

@batCoder95

hiro-o918 commented 3 years ago

I was able to run SparkApplications, setting an iAM role for the service account

dependencies

I did not check other combinations of versions, but I think Hadoop requires 3.2.1 or upper because hadoop-aws 3.2.0 is built by older SDK which does not support assume-web-identity-role

references