Closed ghost closed 4 years ago
Are you deploying to kubernetes and have you configured a readiness/liveness check which connects to the protocol adapters' exposed ports?
We deployed to Kubernetes using the Helm charts of the release and did not configure any readiness/liveness checks ourselves. I'm not sure whether such checks are part of the default installation of Hono.
I have just deployed 1.0-M7 to a local minikube using the Helm chart from the 1.0-M7 archive and I have no such entries in the log files. Can you post the output of
kubectl get pod -n hono
and
kubectl get service -n hono
maybe some pods/services have failed to deploy/start correctly?
(We deployed to namespace hono-m7
.)
kubectl get pod -n hono-m7 (most pod IDs truncated)
NAME READY STATUS RESTARTS AGE
hono-m7-adapter-amqp-vertx-... 1/1 Running 0 19d
hono-m7-adapter-http-vertx-... 1/1 Running 0 19d
hono-m7-adapter-mqtt-vertx-... 1/1 Running 0 19d
hono-m7-artemis-... 1/1 Running 0 19d
hono-m7-dispatch-router-... 1/1 Running 0 19d
hono-m7-grafana-... 2/2 Running 707 19d
hono-m7-prometheus-server-... 2/2 Running 0 19d
hono-m7-service-auth-... 1/1 Running 1 19d
hono-m7-service-device-registry-0 1/1 Running 1 19d
kubectl get service -n hono-m7 (external IP addresses shaded)
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hono-m7-adapter-amqp-vertx LoadBalancer 10.0.97.107 ##IP 1## 5672:32672/TCP,5671:32671/TCP 19d
hono-m7-adapter-amqp-vertx-headless ClusterIP None <none> <none> 19d
hono-m7-adapter-http-vertx LoadBalancer 10.0.10.93 ##IP 2## 8080:30080/TCP,8443:30443/TCP 19d
hono-m7-adapter-http-vertx-headless ClusterIP None <none> <none> 19d
hono-m7-adapter-mqtt-vertx LoadBalancer 10.0.115.92 ##IP 3## 1883:31883/TCP,8883:30883/TCP 19d
hono-m7-adapter-mqtt-vertx-headless ClusterIP None <none> <none> 19d
hono-m7-artemis ClusterIP 10.0.188.240 <none> 5671/TCP 19d
hono-m7-dispatch-router ClusterIP 10.0.10.240 <none> 5673/TCP 19d
hono-m7-dispatch-router-ext LoadBalancer 10.0.254.154 ##IP 4## 15671:30671/TCP,15672:30672/TCP 19d
hono-m7-grafana ClusterIP 10.0.144.64 <none> 3000/TCP 19d
hono-m7-prometheus-server ClusterIP 10.0.209.79 <none> 9090/TCP 19d
hono-m7-service-auth ClusterIP 10.0.93.125 <none> 5671/TCP 19d
hono-m7-service-auth-headless ClusterIP None <none> <none> 19d
hono-m7-service-device-connection-headless ClusterIP None <none> <none> 19d
hono-m7-service-device-registry ClusterIP 10.0.202.227 <none> 5671/TCP 19d
hono-m7-service-device-registry-ext LoadBalancer 10.0.22.132 ##IP 5## 28080:31080/TCP,28443:31443/TCP 19d
hono-m7-service-device-registry-headless ClusterIP None <none> <none> 19d
Ok this looks quite normal, apart from the high number of restarts of the Grafana pod, but anyway ... Can you post the output of
kubectl describe pod hono-m7-adapter-mqtt-vertx-... -n hono-m7
and can you please check if the output of
kubectl logs -f hono-m7-prometheus-server-... -c prometheus-server -n hono-m7
continuously produces new error messages similar to
level=error ts=2019-10-08T07:09:33.117Z caller=dns.go:155 component="discovery manager scrape" discovery=dns msg="Error refreshing DNS targets" err="could not resolve \"hono-service-device-registry-headless\": all servers responded with errors to at least one search domain"
It is ok, if the log contains such errors during the time of startup of prometheus, but once all services were up, the log messages should no longer appear ...
kubectl -n hono-m7 describe pod hono-m7-adapter-mqtt-vertx-... (sensitive data redacted)
Name: hono-m7-adapter-mqtt-vertx-...
Namespace: hono-m7
Priority: 0
Node: aks-nodepool1-...
Start Time: Wed, 18 Sep 2019 17:46:09 +0200
Labels: app.kubernetes.io/component=adapter-mqtt-vertx
app.kubernetes.io/instance=hono-m7
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=eclipse-hono
app.kubernetes.io/version=1.0-M7
helm.sh/chart=eclipse-hono-1.0-M7
pod-template-hash=5d5d968766
Annotations: <none>
Status: Running
IP: 10.244.2.153
IPs: <none>
Controlled By: ReplicaSet/hono-m7-adapter-mqtt-vertx-...
Containers:
eclipse-hono-adapter-mqtt-vertx:
Container ID: docker://...
Image: index.docker.io/eclipse/hono-adapter-mqtt-vertx:1.0-M7
Image ID: docker-pullable://eclipse/hono-adapter-mqtt-vertx@sha256:1e447eb005f439e39ebf5a2c5112fd775360ff8fa2ec91b2bdddc5307f91d562
Ports: 8088/TCP, 1883/TCP, 8883/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Wed, 18 Sep 2019 17:46:28 +0200
Ready: True
Restart Count: 0
Limits:
memory: 256Mi
Requests:
memory: 256Mi
Liveness: http-get https://:8088/liveness delay=180s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:8088/readiness delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
SPRING_CONFIG_LOCATION: file:///etc/hono/
SPRING_PROFILES_ACTIVE: dev
LOGGING_CONFIG: classpath:logback-spring.xml
_JAVA_OPTIONS: -XX:MinRAMPercentage=80 -XX:MaxRAMPercentage=80
KUBERNETES_NAMESPACE: hono-m7 (v1:metadata.namespace)
JAEGER_SERVICE_NAME: hono-m7-adapter-mqtt-vertx
Mounts:
/etc/hono from conf (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-242vq (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
conf:
Type: Secret (a volume populated by a Secret)
SecretName: hono-m7-adapter-mqtt-vertx-conf
Optional: false
default-token-242vq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-242vq
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
The Prometheus log does not contain the error message you mentioned: kubectl logs -f hono-m7-prometheus-server-... -c prometheus-server -n hono-m7
level=info ts=2019-10-08T01:00:00.948Z caller=compact.go:495 component=tsdb msg="write block" mint=1570485600000 maxt=1570492800000 ulid=01DPME2KDE1M0BGJ6QX13MANXS duration=134.69616ms
level=info ts=2019-10-08T01:00:00.957Z caller=head.go:586 component=tsdb msg="head GC completed" duration=979.195µs
level=info ts=2019-10-08T01:00:01.005Z caller=head.go:656 component=tsdb msg="WAL checkpoint complete" first=228 last=229 duration=48.201535ms
level=info ts=2019-10-08T03:00:00.943Z caller=compact.go:495 component=tsdb msg="write block" mint=1570492800000 maxt=1570500000000 ulid=01DPMMYANDPKNJ1D5H23YB53NQ duration=129.671226ms
level=info ts=2019-10-08T03:00:00.951Z caller=head.go:586 component=tsdb msg="head GC completed" duration=972.495µs
level=info ts=2019-10-08T03:00:01.121Z caller=compact.go:440 component=tsdb msg="compact blocks" count=3 mint=1570471200000 maxt=1570492800000 ulid=01DPMMYAT0SZE3AEJ2CRNPKY4R sources="[01DPM0B4XE4C7J5WYAGGZQSD2F 01DPM76W5D759MFTGGSJYG7YBR 01DPME2KDE1M0BGJ6QX13MANXS]" duration=161.096363ms
level=info ts=2019-10-08T03:00:01.308Z caller=compact.go:440 component=tsdb msg="compact blocks" count=3 mint=1570428000000 maxt=1570492800000 ulid=01DPMMYAZDS9JC0VZYPSFK60G2 sources="[01DPKBQZ9SCV08XZA95DZ6DCTD 01DPM0B55HVK9SD1X589SAAWM7 01DPMMYAT0SZE3AEJ2CRNPKY4R]" duration=175.287889ms
level=info ts=2019-10-08T05:00:01.032Z caller=compact.go:495 component=tsdb msg="write block" mint=1570500000000 maxt=1570507200000 ulid=01DPMVT1XEWJF22CCYFH6RY9KK duration=217.633726ms
level=info ts=2019-10-08T05:00:01.040Z caller=head.go:586 component=tsdb msg="head GC completed" duration=950.995µs
level=info ts=2019-10-08T05:00:01.086Z caller=head.go:656 component=tsdb msg="WAL checkpoint complete" first=230 last=231 duration=45.967851ms
level=info ts=2019-10-08T07:00:00.989Z caller=compact.go:495 component=tsdb msg="write block" mint=1570507200000 maxt=1570514400000 ulid=01DPN2NS5EZ8S2ZF849YV5J2PA duration=175.479847ms
level=info ts=2019-10-08T07:00:01.002Z caller=head.go:586 component=tsdb msg="head GC completed" duration=924.795µs
Am I right in assuming that
Node: aks-nodepool1-...
indicates that you have deployed to Amazon's Azure's managed kubernetes service (AKS) using their load balancers in front of the protocol adapters?
Bingo! We deployed to Azure Kubernetes Service and the services of type LoadBalancer have Frontend IP configurations in an Azure load balancer resource. The Azure load balancer resource has Health probes that have an interval of 5 seconds. When I modify the interval of the health probe for the TLS port provided by the MQTT adapter, the block of 4 error messages changes to the new interval.
So, yeah, the Azure load balancer health probes are the source of the log statements!
Glad we found the source of the logs. However, I am afraid that I cannot help you with getting rid of them because I have no clue about configuration of load balancers on Azure. For that I would kindly refer you to the Azure AKS documentation ... Would you mind closing this issue based on the current outcome?
With an installation of Hono 1.0-M7 (and earlier versions, too) that currently has no devices connected via AMQP or MQTT we see a lot of output like
AMQP adapter
MQTT adapter
This is interesting because there is a clear temporal pattern of 4 sources of log messages that create a log statement every 5 seconds. I.e. if you group them by the 5 seconds interval:
AMQP adapter
MQTT adapter
This bloats the logs somewhat and makes debugging harder. Is anyone else experiencing a similiar issue?