Yolean / kubernetes-kafka

Kafka cluster as Kubernetes StatefulSet, plain manifests and config
Apache License 2.0
1.84k stars 735 forks source link

Kafka init container does not handle kubectl error #246

Open goodhoko opened 5 years ago

goodhoko commented 5 years ago

Hello Solsson .)

If I get this part of init.sh right:

ZONE=$(kubectl get node "$NODE_NAME" -o=go-template='{{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}')
      if [ $? -ne 0 ]; then
        SEDS+=("s/#init#broker.rack=#init#/#init#broker.rack=# zone lookup failed, see -c init-config logs/")
      elif [ "x$ZONE" == "x<no value>" ]; then
        SEDS+=("s/#init#broker.rack=#init#/#init#broker.rack=# zone label not found for node $NODE_NAME/")
      else
        SEDS+=("s/#init#broker.rack=#init#/broker.rack=$ZONE/")
        LABELS="$LABELS kafka-broker-rack=$ZONE"
      fi

It's purpose is to try to lookup the zone. The If part then should handle eventual failure of the kubectl command.

However in our case when the kubectl fails with

Error from server (Forbidden): nodes "gke-testing-preemptible-pool-c2ad7969-qr2c" is forbidden: User "system:serviceaccount:devel:default" cannot get nodes at the cluster scope

it fails the whole init container (I guess becase of the set -e above) and Init:CrashLoopBackOff begins. 🔁

I should add that we made no modifications to the init script nor the manifests of this repo.


the init container logs:

goodhoko:~/Workspace/GoOut/Kafka$ k logs kafka-1 -c init-config
+ cp /etc/kafka-configmap/log4j.properties /etc/kafka/
+ KAFKA_BROKER_ID=1
+ SEDS=("s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/")
+ LABELS=kafka-broker-id=1
+ ANNOTATIONS=
+ hash kubectl
++ kubectl get node gke-testing-preemptible-pool-c2ad7969-qr2c '-o=go-template={{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}'
Error from server (Forbidden): nodes "gke-testing-preemptible-pool-c2ad7969-qr2c" is forbidden: User "system:serviceaccount:devel:default" cannot get nodes at the cluster scope
+ ZONE=
solsson commented 5 years ago

I think the desired behavior is this:

Right? Hence if you'd prefer to not apply the rbac, patch the init script to not do get node.

goodhoko commented 5 years ago

Yes I can see the objective here but then the

if [ $? -ne 0 ]; then

condition is good for nothing as when it would be true the init script has already exited, right?

goodhoko commented 5 years ago

@solsson I've reated a pull request which removes these checks. Of course, I don't know if there is some more reasoning behind them. If so just close the request. I've created with an attitude of 'better propose something real than just complain in comments.