confluentinc / cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
https://cnfl.io/getting-started-kafka-kubernetes
Apache License 2.0
790 stars 843 forks source link

Kafka, Schema Registry, and Zookeeper charts all have permissions issues out of the box on EKS #548

Open mcgrawia opened 3 years ago

mcgrawia commented 3 years ago

Hi confluent team,

My team and I encountered user permissions issues with the kafka, schema registry, and zookeeper charts when using them out of the box on EKS. Here are the issues we saw:

Issues

1. Kafka

Logs:

kubectl logs -f messaging-0 -ccp-kafka-broker
[2021-07-19 19:21:44,415] INFO Log directory /opt/kafka/data-0/logs not found, creating it. (kafka.log.LogManager)
[2021-07-19 19:21:44,417] ERROR Failed to create or validate data directory /opt/kafka/data-0/logs (kafka.server.LogDirFailureChannel)
java.io.IOException: Failed to create data directory /opt/kafka/data-0/logs
    at kafka.log.LogManager.$anonfun$createAndValidateLogDirs$1(LogManager.scala:181)
    at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)
    at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:920)
    at kafka.log.LogManager.createAndValidateLogDirs(LogManager.scala:172)
    at kafka.log.LogManager.<init>(LogManager.scala:89)
    at kafka.log.LogManager$.apply(LogManager.scala:1289)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:476)
    at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
    at kafka.Kafka$.main(Kafka.scala:82)
    at kafka.Kafka.main(Kafka.scala)
[2021-07-19 19:21:44,426] ERROR Shutdown broker because none of the specified log dirs from /opt/kafka/data-0/logs can be created or validated (kafka.log.LogManager)

It looks like the directory is owned by root, but the pod is running as user 1000:

kubectl exec -it messaging-0 -ccp-kafka-broker -- ls -la /opt/kafka/data-0/
total 20
drwxr-xr-x 3 root root  4096 Jul 19 15:15 .
drwxr-xr-x 3 root root    20 Jul 19 19:26 ..
drwx------ 2 root root 16384 Jul 19 15:15 lost+found

The fix: To get the pod to start, we needed to add this to the pod's spec:

securityContext:
  fsGroup: 1000

2. Zookeeper

Logs:

 kubectl logs -f messaging-cp-zookeeper-0 -c cp-zookeeper-server
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
[Errno 13] Permission denied: '/var/lib/zookeeper/data/myid'
Command [/usr/local/bin/dub template /etc/confluent/docker/myid.template /var/lib/zookeeper/data/myid] FAILED !

Again, it looks like the directory is owned by root:

kubectl exec -it messaging-cp-zookeeper-0 -c cp-zookeeper-server -- ls -la /var/lib/zookeeper/data
total 20
drwxr-xr-x 3 root    root     4096 Jul 19 19:33 .
drwxr-x--- 4 appuser appuser    29 Feb  4 21:07 ..
drwx------ 2 root    root    16384 Jul 19 19:33 lost+found

The fix: To get the pod to start, we needed to add this to the pod's spec:

securityContext:
  fsGroup: 1000

3. Schema registry

The pod fails to start at all with ContainerCannotRun:

kubectl describe pod messaging-schema-registry-84c89694bf-q84jp
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Warning  Failed     3s (x3 over 17s)  kubelet            Error: failed to start container "cp-schema-registry-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: chdir to cwd ("/home/appuser") set in config.json failed: permission denied: unknown

The fix: Override the chart's default values.yaml to user 1000:

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000
  runAsNonRoot: true

instead of:

securityContext:
  runAsUser: 10001
  runAsGroup: 10001
  fsGroup: 10001
  runAsNonRoot: true

Related?

I think this is also related to the following issues: https://github.com/confluentinc/cp-helm-charts/issues/501 https://github.com/confluentinc/cp-helm-charts/issues/497

Do you accept PRs? Happy to submit one to fix these.

Thanks

archi13 commented 3 years ago

It seems to me that the currently published charts don't include the securityContext sections. helm template -f values.yaml . - produces yaml with securityContext (given the command runs inside the cloned repo) helm template -f values.yaml confluentinc/cp-helm-charts - produces yaml without securityContext

seboudry commented 2 years ago

Hi!

Having same issue of user not correctly defined on chart. And since confluent images 6.2.2+ releases there is a check to ensure correct rights on directory leading to a crashloopbackoff with:

===> User
uid=10001 gid=10001 groups=10001
===> Configuring ...
Command [/usr/local/bin/dub path /etc/schema-registry/ writable] FAILED !

Same as https://github.com/confluentinc/schema-registry-images/pull/48#issuecomment-991336270 and https://github.com/confluentinc/kafka-images/issues/127

Since january 2020 the appuser user is use by default in all cp-* images and defined like that:

uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)

See PRs https://github.com/confluentinc/common-docker/pull/54 and https://github.com/confluentinc/schema-registry-images/pull/13

So the securityConfig section in all Helm charts in this repository aren't correct as they defined the user as 10001:10001.

Also, in my opinion, having the possibility to define a securityConfig properties is good but chart won't have to set default values as it's up to deployment teams to know their cluster configuration. Even more, if the image already run a non root user.

haghayeghh commented 11 months ago

you can override default values.yaml for securityContext like this:

securityContext: 
    runAsUser: 0