apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.95k stars 14.26k forks source link

Add a keytab field to the Spark connection (handle the base64 encoded text value as a credential). #40749

Closed seyoon-lim closed 1 month ago

seyoon-lim commented 3 months ago

Description

Hello,

I would like to introduce and propose a feature I have personally implemented in the SparkSubmitHook for managing Spark connections.

This feature focuses on managing the Kerberos-related principal and keytab within the connection settings.

The basic idea is to store the keytab as a base64 encoded credential. When submitting a Spark job, the credential is decoded and saved as a file, and its path is specified during the submission.

image

Use case/motivation

Setting up the keytab for each worker can be challenging, and managing the keytab each time it changes can be cumbersome. By storing this information within the connection, we can reduce the need to deploy the keytab every time.

I would appreciate your consideration of this proposal.

Thank you.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

eladkal commented 3 months ago

If you already have the implementation it's best to simply open a PR so we can review the code and the suggestion

seyoon-lim commented 3 months ago

@eladkal Okay, I will make a PR soon. Thank you!