Open archenroot opened 3 years ago
Ok, I understand its in waiting state for bookkeeper and zookeeper after examining:
Image | apachepulsar/pulsar-all:2.7.2
-- | --
Image ID | docker-pullable://apachepulsar/pulsar-all@sha256:96d56238cbf57379b4d09f53e73bfb323787a6d79b36044276a515bb031c2218
Command | ['sh', '-c']
Args | [' until bin/bookkeeper org.apache.zookeeper.ZooKeeperMain -server pulsar-cs-zookeeper:2181 get /admin/clusters/pulsar-mini; do echo "pulsar cluster pulsar-mini isn't initialized yet ... check in 3 seconds ..." && sleep 3; done;']
I wonder pulsar-cs-zookeper is valid address in kubernetes cluster its should be using service cluster name: pulsar-mini-zookeeper.svc.cluster.local and not pulsar-cs-zookeeper
But I will need to examine pod networking first...
So found tested connectivity on zookeeper pod itself via service cluster URL:
root@pulsar-mini-zookeeper-0:/pulsar# telnet pulsar-mini-zookeeper.pulsar.svc.cluster.local 2181
Trying 10.222.104.7...
Connected to pulsar-mini-zookeeper.pulsar.svc.cluster.local.
Escape character is '^]'.
stats
Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT
Clients:
/10.222.104.7:50250[1](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/6/66
Received: 2254
Sent: 2253
Connections: 1
Outstanding: 0
Zxid: 0x1000009a1
Mode: follower
Node count: 5
Connection closed by foreign host.
So it works at least, so the other pod is having issue with connectivity. I searched for the command prefix from wait-for-zookeper pod:
command prefix:
[' until bin/bookkeeper org.apache.zookeeper.ZooKeeperMain -server pulsar-cs-zookeeper:2181 get /admin/clusters/pulsar-mini; do echo "pulsar cluster pulsar-mini isn't initialized yet ... check in 3 seconds ..." && sleep 3; done;']
I think this is the issue: -server pulsar-cs-zookeeper:2181
Its not reachable even from zookeeper itself: root@pulsar-mini-zookeeper-0:/pulsar# ping pulsar-cs-zookeeper ping: pulsar-cs-zookeeper: Name or service not known
So I searched where is it coming from
zangetsu@andromeda ~/proj/infrastructure/k8s-vagrant-multi-node_archenroot/k8s/apache-pulsar $ grep -R "until bin/bookkeeper"
pulsar-helm-chart/charts/pulsar/templates/_autorecovery.tpl:until bin/bookkeeper shell whatisinstanceid; do
pulsar-helm-chart/charts/pulsar/templates/pulsar-cluster-initialize.yaml: until bin/bookkeeper shell whatisinstanceid; do
pulsar-helm-chart/charts/pulsar/templates/_bookkeeper.tpl:until bin/bookkeeper shell whatisinstanceid; do
pulsar-helm-chart/charts/pulsar/templates/_bookkeeper.tpl:until bin/bookkeeper shell whatisinstanceid; do
pulsar-helm-chart/charts/pulsar/templates/broker-statefulset.yaml: until bin/bookkeeper org.apache.zookeeper.ZooKeeperMain -server {{ template "pulsar.configurationStore.connect" . }} get {{ .Values.configurationStoreMetadataPrefix }}/admin/clusters/{{ template "pulsar.cluster.name" . }}; do
pulsar-helm-chart/charts/pulsar/templates/broker-statefulset.yaml: until bin/bookkeeper org.apache.zookeeper.ZooKeeperMain -server {{ template "pulsar.zookeeper.connect" . }} get {{ .Values.metadataPrefix }}/admin/clusters/{{ template "pulsar.cluster.name" . }}; do
pulsar-helm-chart/charts/pulsar/templates/broker-statefulset.yaml: until bin/bookkeeper shell whatisinstanceid; do
Search more:
zangetsu@andromeda ~/proj/infrastructure/k8s-vagrant-multi-node_archenroot/k8s/apache-pulsar $ grep -R "pulsar.zookeeper.connect"
pulsar-helm-chart/charts/pulsar/templates/_zookeeper.tpl:{{- define "pulsar.zookeeper.connect" -}}
So in the template file its defined as:
{{/*
Define the pulsar zookeeper
*/}}
{{- define "pulsar.zookeeper.connect" -}}
{{$zk:=.Values.pulsar_metadata.userProvidedZookeepers}}
{{- if and (not .Values.components.zookeeper) $zk }}
{{- $zk -}}
{{ else }}
{{- if not (and .Values.tls.enabled .Values.tls.zookeeper.enabled) -}}
{{ template "pulsar.zookeeper.service" . }}:{{ .Values.zookeeper.ports.client }}
{{- end -}}
{{- if and .Values.tls.enabled .Values.tls.zookeeper.enabled -}}
{{ template "pulsar.zookeeper.service" . }}:{{ .Values.zookeeper.ports.clientTls }}
{{- end -}}
{{- end -}}
{{- end -}}
So I try to set userProvidedZookeepers
pulsar_metadata:
configurationStore: pulsar-cs-zookeeper
configurationStoreMetadataPrefix: "/configuration-store"
userProvidedZookeepers: "pulsar-mini-zookeeper.pulsar.svc.cluster.local:2181"
So I finally got this command working on zookeeper pod:
root@pulsar-mini-zookeeper-0:/pulsar# until bin/bookkeeper org.apache.zookeeper.ZooKeeperMain -server pulsar-mini-zookeeper.pulsar.svc.cluster.local:2181 get /admin/clusters/pulsar-mini; do echo "pulsar cluster pulsar-mini isn't initialized yet ... check in 3 seconds ..." && sleep 3; done;
It results in following error:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /admin/clusters/pulsar-mini
would be really happy to see this error somewhere :-))) to not need to dig so deep
ref: https://github.com/apache/pulsar/issues/4480 I am getting now suspicious that its the metadata of the cluster whats being not in good shape:
- >
{{- include "pulsar.toolset.zookeeper.tls.settings" . | nindent 12 }}
bin/pulsar initialize-cluster-metadata \
--cluster {{ template "pulsar.cluster.name" . }} \
--zookeeper {{ template "pulsar.zookeeper.connect" . }}{{ .Values.metadataPrefix }} \
{{- if .Values.pulsar_metadata.configurationStore }}
--configuration-store {{ template "pulsar.configurationStore.connect" . }}{{ .Values.pulsar_metadata.configurationStoreMetadataPrefix }} \
{{- end }}
{{- if not .Values.pulsar_metadata.configurationStore }}
--configuration-store {{ template "pulsar.zookeeper.connect" . }}{{ .Values.metadataPrefix }} \
{{- end }}
--web-service-url http://{{ template "pulsar.fullname" . }}-{{ .Values.broker.component }}.{{ template "pulsar.namespace" . }}.svc.{{ .Values.clusterDomain }}:{{ .Values.broker.ports.http }}/ \
--web-service-url-tls https://{{ template "pulsar.fullname" . }}-{{ .Values.broker.component }}.{{ template "pulsar.namespace" . }}.svc.{{ .Values.clusterDomain }}:{{ .Values.broker.ports.https }}/ \
--broker-service-url pulsar://{{ template "pulsar.fullname" . }}-{{ .Values.broker.component }}.{{ template "pulsar.namespace" . }}.svc.{{ .Values.clusterDomain }}:{{ .Values.broker.ports.pulsar }}/ \
--broker-service-url-tls pulsar+ssl://{{ template "pulsar.fullname" . }}-{{ .Values.broker.component }}.{{ template "pulsar.namespace" . }}.svc.{{ .Values.clusterDomain }}:{{ .Values.broker.ports.pulsarssl }}/ || true;
This is pulsar-cluster-initialize.yaml file where metadata gets initiated
Above script is suspicious to me from fact that its providing both nonTLS and TLS URL endpoints, but maybe pulsar can handle this. I mean I have TLS disabled in values file, so I shouldn't be seeing any kind of https ....
I am not able to figure out, but on zookeeper logs when starting I see another suspicious messages:
pulsar-mini-zookeeper
21:54:01.902 [WorkerSender[myid=1]] WARN org.apache.zookeeper.server.quorum.QuorumPeer - Failed to resolve address: pulsar-mini-zookeeper-1.pulsar-mini-zookeeper.pulsar.svc.cluster.local
pulsar-mini-zookeeper
java.net.UnknownHostException: pulsar-mini-zookeeper-1.pulsar-mini-zookeeper.pulsar.svc.cluster.local
pulsar-mini-zookeeper
at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_282]
pulsar-mini-zookeeper
at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_282]
pulsar-mini-zookeeper
at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_282]
The message is WARN only, but the address which is trying to connect to is wrong. pulsar-mini-zookeeper string shoulnd't be there.
So, it seems when I comment out the following configuration:
#metadataPrefix: "/cluster1"
pulsar_metadata:
# configurationStore: pulsar-cs-zookeeper
# pulsar-cs-zookeeper
# configurationStoreMetadataPrefix: "/configuration-store"
I only enabled limited components set I will continue with uncommenting and redeploying to see what attribute causes failures.
I also need to enable ingress nodeport for services so I can easily access from localhost for testing.
So with enabling metadataPrefix: "/cluster1" the cluster won't start:
same failed state observed with:
pulsar_metadata:
configurationStore: pulsar-cs-zookeeper
So these metadataPrefix and configurationStore enabled causing clsuter not initialize.
The issue had no activity for 30 days, mark with Stale label.
Describe the bug I am running on qemu (libvirt) vagrant k8s cluster with 1 master and 2 nodes with following config (just for imagination that it has enough resources):
I use following values file (customized from examples): https://gist.github.com/archenroot/c6c15b957758226473530825deae7649
I use following sequence to install pulsar on k8s (tls is disabled as there was some additional issue with webhook):
During my tests I experienced also following error with replica set to 1 for bookkeeper, zookepier and broker:
apache pulsar statefulsets.apps does no t implement the scale subresource on
But at moment I have 2 replicas config (as per value file) and played bit with disabling enabling components and after about 15 minutes pulsar namespace looks like this: zangetsu@andromeda ~ $ kubectl get pods -n pulsar
kubectl describe for all pods in Init state here:
Expected behavior Pulsar is up and running...
Screenshots Pods in octant StatefulSets
Desktop (please complete the following information):
Additional context Add any other context about the problem here.
I am bit lost about where to look for possible issue