Secure HDFS Support - Githubissues

apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

https://spark.apache.org/

Apache License 2.0

612 stars 118 forks source link

Secure HDFS Support #414

Closed ifilonenko closed 7 years ago

ifilonenko commented 7 years ago

What changes were proposed in this pull request?

This it the on-going work of setting up Secure HDFS interaction with Spark-on-K8S. The architecture is discussed in this community-wide google doc This initiative can be broken down into 4 stages.

STAGE 1

[x] Detecting HADOOP_CONF_DIR environmental variable and using Config Maps to store all Hadoop config files locally, while also setting HADOOP_CONF_DIR locally in the driver / executors

STAGE 2

[x] Grabbing TGT from LTC or using keytabs+principle and creating a DT that will be mounted as a secret

STAGE 3

[x] Driver + Executor Logic

How was this patch tested?

E2E Integration tests
- [x] Stage 1
- [x] Stage 2
- [x] Stage 3
Unit tests
- [x] Stage 1
- [x] Stage 2
- [x] Stage 3

Docs and Error Handling?

[x] Docs
[x] Error Handling

ifilonenko commented 7 years ago

rerun unit tests please

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun unit tests please

kimoonkim commented 7 years ago

@ifilonenko and I talked offline. I am doing a preliminary review on this end-to-end prototype. After this review, we want to break this into smaller PRs and add unit tests to them.

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun unit tests please

ifilonenko commented 7 years ago

rerun integration tests please

erikerlandson commented 7 years ago

If this is still having integration-test problems, rebasing from latest branch-2.2-kubernetes should fix that.

ifilonenko commented 7 years ago

Fully functional Secure HDFS support. Adding extra integration test for $kinit, to compliment --keytab login and unit tests to complete PR

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

@kimoonkim @foxish @erikerlandson Please review before I merge into hdfs branch which should be rebased onto branch-2.2

ifilonenko commented 7 years ago

Partial mocking of UGI functions has been done, with the exception of the FileSystem portion in the KeytabResolverStep.

Garbage Collection of the secret post job is already handled by the Client.scala OwnerReference.

Current failures in integration tests are due to issues found after rebasing PRs. Will be addressed before ready for merging

ifilonenko commented 7 years ago

rerun integration tests please

ifilonenko commented 7 years ago

rerun integration tests please

kimoonkim commented 7 years ago

The latest commit addressed most of my comments. Looks great to me. Thanks @ifilonenko for the work so far.

ifilonenko commented 7 years ago

@erikerlandson after all tests pass, can you give the final okay before merge?

erikerlandson commented 7 years ago

LGTM, and passing CI. This is good to merge when we're ready!

foxish commented 7 years ago

Let's merge after cutting the new release and tagging.

ifilonenko commented 7 years ago

important note. this PR will require refactoring upon merging because of most recent commits with renaming and unit test additions to the KubernetesSchedulerBackend. These changes will be handled on the hdfs-kerberos-support branch directly.