Open batCoder95 opened 4 years ago
IRSA requires AWS SDK support with assume-web-identity-role
. I have not check dependencies. If you have time to help, feel free to take a look. Additionaly, S3 is a little bit different, because spark use hadoop-s3 package. You'd better check mapping and find the version.
@batCoder95
I was able to run SparkApplications, setting an iAM role for the service account
dependencies
I did not check other combinations of versions, but I think Hadoop requires 3.2.1 or upper because hadoop-aws 3.2.0 is built by older SDK which does not support assume-web-identity-role
references
Hi all,
I wanted to check if it is possible to define an AWS IAM role that should be attached to driver and worker pods in Spark application YAML file. As per my understanding, currently these pods inherit the role from EKS node-group, but I would like to specify my own custom role in the YAML file. Can somebody please suggest on the possibility ?
I thought of passing IAM Role / OIDC Service Account in below manner but did not work. It fails saying S3 access denied (403 error).
apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: test-pyspark-app namespace: dev spec: type: Python pythonVersion: "3" mode: cluster image: "coqueirotree/spark-py" imagePullPolicy: Always mainApplicationFile: s3a://my-bucket/TestFile.py sparkVersion: "3.0.0" sparkConf: "spark.kubernetes.driverEnv.serviceAccount": "svc-spark-account" "spark.kubernetes.driverEnv.serviceAccountName": "svc-spark-account" "spark.kubernetes.authenticate.driver.serviceAccount": "svc-spark-account" "spark.kubernetes.authenticate.driver.serviceAccountName": "svc-spark-account" "spark.kubernetes.executorEnv.serviceAccount": "svc-spark-account" "spark.kubernetes.executorEnv.serviceAccountName": "svc-spark-account" "spark.kubernetes.authenticate.executor.serviceAccount": "svc-spark-account" "spark.kubernetes.authenticate.executor.serviceAccountName": "svc-spark-account" driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.0.0 serviceAccount: svc-spark-account serviceAccountName: svc-spark-account executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.0.0 serviceAccount: svc-spark-account serviceAccountName: svc-spark-account
Can someone please advise if I should do something differently in this case? Thanks in advance :)