Closed kimoonkim closed 7 years ago
As you mentioned above, use of hostPath isn't ideal. Is the problem installing a secret for each node?
I can imagine creating a secret per node using a unique name. Perhaps including the cluster node names in secret names.
The problem is more of how the k8s DaemonSet
yaml file refers to different secret name per node. AFAIK, there is no macro mechanism for expanding a common name in k8s resource file to multiple different names depending on which cluster node each daemon set pod lands on. I was hoping helm chart may have something, but it doesn't seem so.
Anyone knows better? :-)
@kimoonkim can you elaborate on what you meant by keytab files on different cluster nodes should have different passwords as file content
? Why different nodes should have different keytab files?
Question - what happens when the pod host-name changes because an hdfs-data-node is killed and respawned? Would that make the KDC reject the newly spawned hdfs-data-node?
If you have access to a 1.7.x cluster, can you try using the local PVs? They're alpha, but if we need specific features to support use-cases like this, now would be the time to find those out.
@kimoonkim can you elaborate on what you meant by keytab files on different cluster nodes should have different passwords as file content? Why different nodes should have different keytab files?
Sure. In Kerberos, it is the convention that each and every service daemon instance on a different host has a unique principal account by including the hostname into the principal name. e.g. Datanode 1 on hostA will have hdfs@hostA/MYREALM
as the account, and datanode 2 on hostB will have hdfs@hostB/MYREALM
. Since they are different accounts, they are supposed to have different passwords. This way, when a single keytab file is compromised by an intruder, you will only lose data managed by the particular daemon instance. (Or, the particular daemon instance can become a phishing server) In contrast, if you use a single account and password for all daemon instances, you can lose all data across cluster nodes.
Question - what happens when the pod host-name changes because an hdfs-data-node is killed and respawned? Would that make the KDC reject the newly spawned hdfs-data-node?
Datanodes use hostNetwork
. So they are not associated with the pod names as identities. Instead, they identify with the underlying cluster node host names. When a datanode pod is recreated, it will still use the correct cluster node host name in the principal and so won't be rejected by KDC.
If you have access to a 1.7.x cluster, can you try using the local PVs? They're alpha, but if we need specific features to support use-cases like this, now would be the time to find those out.
I like the suggestion. I think using local PVs is a better solution in terms of access control. I'd love to try it out. I didn't know 1.7 has it as alpha. Thanks!
@erikerlandson @liyinan926 @foxish
Instead of using hostPath
for the keytab files, I wonder if we can mount a single K8s secret or PVC as the user specifies. The single secret or PVC will be shared by the namenode and all the datanode pods. It will have all the keytab files stored in it, named after the cluster node host names.
Then each pod will use an init container to copy the keytab file matching its cluster node host name to the right path that the daemon is expecting.
K8s secret has 1 MB size limit. I checked a few keytab files. Their size is ~ 1 KB. So a single secret can support up to 1000 keytab files. If your cluster is larger, then you can switch to PVC. One could use whatever PV for this. I plan to test this using NFS PV.
A nice thing about this approach is that access control of secret or PVC like using RBAC is more streamlined than restricting hostPath
privilege.
What do you think? Any concerns?
I personally like the idea of using a secret. The only concern I have is once the secret gets mounted to a pod, it makes every keytab files locally available to the pod. We can definitely set copy the right keytab file out and set HADOOP_TOKEN_FILE_LOCATION
to point to it. But what about the rest keytab files, should we delete them from the mount path?
Since this is a single shared secret or PV, deleting other keytab files may affect other pods.
Ideally, each pod would want to unmount the volume after it copied the right keytab file. That might not be easy or require non-default service account.
Fundamentally, this approach assumes that the shared secret and HDFS pods are well guarded by RBAC. Otherwise, you end up exposing keytab files. I understand risking exposure of only one keytab at a time is better than otherwise. But it's likely cluster admin would do set up RBAC for all of them in the right way.
Sorry for the confusion. I meant deleting the locally mounted key tab files. This will not change the secret itself.
@liyinan926 Would it be even possible deleting only locally mounted files without affecting the secret itself? Maybe the pod will get access error? What about PVC volume? Is it possible to delete only the locally mounted files without affecting the remote files?
@foxish correct me if I'm wrong. I believe it's possible to delete locally mounted files (stored in memory locally) without affecting the secret itself, unless the secret object gets updated through the api server.
@liyinan926 I think we can mount the secret only in the init-container, because volumeMounts
is container-specific. (It is spec.containers[].volumeMounts[]
)
Then the init-container will copy the matching keytab file to a mount path of an emptyDir
volume that is mounted both by the init-container and the datanode container. After the init-container is done, the secret will be unmounted automatically.
I think #24 is already doing something similar for copying jsvc to the datanode container using emptyDir
.
Switched #24 to the secret/init-container approach and it seems to work! Thanks @erikerlandson, @foxish and @liyinan926 for inspirations.
@kimoonkim Cool! Did you try deleting unwanted key tab files?
No, we didn't need to. We mount the secret only in the init-container. After the init-container finishes, the other keytab files will be unmounted and disappear from the pod. So no need to delete them. See the previous comment or the latest commit for details.
Ah, I didn't see it. This is great!
Vanilla Hadoop is not secure without Kerberos. Intruders can easily use custom client code to put false user names and steal or destroy data.
I and sending a PR soon. The main design ideas are:
jsvc
to bind to privileged ports. Use k8sinitContainer
to copyjsvc
in place.We use a K8s secret for (3) so it is easy to apply access control using RBAC. A secret has size limit of 1 MB and each keytab file is less than 1 KB. So a secret can hold up to 1000 keytab files. If the cluster is larger, then we can switch to a PVC backed by NFS, etc.
For more details as to how-to, see
README.md
files in the PR. Please feel free to ask questions or make suggestions for the design approach.The prototype seems to work :-) From the namenode log:
I was also able to run Spark jobs against this secure HDFS, using apache-spark-on-k8s PR #414 that has secure Hadoop support.
17/08/29 14:57:30 INFO Client: Application spark-dfsrw finished.
From the driver log:
The core-site.xml and hdfs-site.xml in /usr/local/hadoop/conf had the following content:
@ifilonenko @liyinan926 @foxish