eclipse-hono / hono

Eclipse Hono™ Project
https://eclipse.dev/hono
Eclipse Public License 2.0
451 stars 137 forks source link

ClosedChannelExceptions in logs of AMQP and MQTT adapter #1549

Closed ghost closed 4 years ago

ghost commented 4 years ago

With an installation of Hono 1.0-M7 (and earlier versions, too) that currently has no devices connected via AMQP or MQTT we see a lot of output like

AMQP adapter

15:04:21.197 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:21.629 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:22.739 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:24.078 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:26.197 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:26.630 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:27.740 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:29.078 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:31.199 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:31.632 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:32.743 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:34.079 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:36.200 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:36.634 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:37.744 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:39.079 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:41.202 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:41.635 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:42.747 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:04:44.080 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException

MQTT adapter

15:09:11.198 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:11.739 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:13.684 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:14.120 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:16.199 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:16.740 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:18.687 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:19.121 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:21.201 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:21.741 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:23.690 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:24.120 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:26.202 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:26.742 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:28.691 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:29.120 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:31.203 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException
15:09:31.743 [vert.x-eventloop-thread-0] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin null failed to connect over ssl: java.nio.channels.ClosedChannelException

This is interesting because there is a clear temporal pattern of 4 sources of log messages that create a log statement every 5 seconds. I.e. if you group them by the 5 seconds interval:

AMQP adapter

15:04:21.197
15:04:26.197
15:04:31.199
15:04:36.200
15:04:41.202

15:04:21.629
15:04:26.630
15:04:31.632
15:04:36.634
15:04:41.635

15:04:22.739
15:04:27.740
15:04:32.743
15:04:37.744
15:04:42.747

15:04:24.078
15:04:29.078
15:04:34.079
15:04:39.079
15:04:44.080

MQTT adapter

15:09:11.198
15:09:16.199
15:09:21.201
15:09:26.202
15:09:31.203

15:09:11.739
15:09:16.740
15:09:21.741
15:09:26.742
15:09:31.743

15:09:13.684
15:09:18.687
15:09:23.690
15:09:28.691

15:09:14.120
15:09:19.121
15:09:24.120
15:09:29.120

This bloats the logs somewhat and makes debugging harder. Is anyone else experiencing a similiar issue?

sophokles73 commented 4 years ago

Are you deploying to kubernetes and have you configured a readiness/liveness check which connects to the protocol adapters' exposed ports?

ghost commented 4 years ago

We deployed to Kubernetes using the Helm charts of the release and did not configure any readiness/liveness checks ourselves. I'm not sure whether such checks are part of the default installation of Hono.

sophokles73 commented 4 years ago

I have just deployed 1.0-M7 to a local minikube using the Helm chart from the 1.0-M7 archive and I have no such entries in the log files. Can you post the output of

kubectl get pod -n hono

and

kubectl get service -n hono

maybe some pods/services have failed to deploy/start correctly?

ghost commented 4 years ago

(We deployed to namespace hono-m7.)

kubectl get pod -n hono-m7 (most pod IDs truncated)

NAME                                READY   STATUS    RESTARTS   AGE
hono-m7-adapter-amqp-vertx-...      1/1     Running   0          19d
hono-m7-adapter-http-vertx-...      1/1     Running   0          19d
hono-m7-adapter-mqtt-vertx-...      1/1     Running   0          19d
hono-m7-artemis-...                 1/1     Running   0          19d
hono-m7-dispatch-router-...         1/1     Running   0          19d
hono-m7-grafana-...                 2/2     Running   707        19d
hono-m7-prometheus-server-...       2/2     Running   0          19d
hono-m7-service-auth-...            1/1     Running   1          19d
hono-m7-service-device-registry-0   1/1     Running   1          19d

kubectl get service -n hono-m7 (external IP addresses shaded)

NAME                                         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)                           AGE
hono-m7-adapter-amqp-vertx                   LoadBalancer   10.0.97.107    ##IP 1##         5672:32672/TCP,5671:32671/TCP     19d
hono-m7-adapter-amqp-vertx-headless          ClusterIP      None           <none>           <none>                            19d
hono-m7-adapter-http-vertx                   LoadBalancer   10.0.10.93     ##IP 2##         8080:30080/TCP,8443:30443/TCP     19d
hono-m7-adapter-http-vertx-headless          ClusterIP      None           <none>           <none>                            19d
hono-m7-adapter-mqtt-vertx                   LoadBalancer   10.0.115.92    ##IP 3##         1883:31883/TCP,8883:30883/TCP     19d
hono-m7-adapter-mqtt-vertx-headless          ClusterIP      None           <none>           <none>                            19d
hono-m7-artemis                              ClusterIP      10.0.188.240   <none>           5671/TCP                          19d
hono-m7-dispatch-router                      ClusterIP      10.0.10.240    <none>           5673/TCP                          19d
hono-m7-dispatch-router-ext                  LoadBalancer   10.0.254.154   ##IP 4##         15671:30671/TCP,15672:30672/TCP   19d
hono-m7-grafana                              ClusterIP      10.0.144.64    <none>           3000/TCP                          19d
hono-m7-prometheus-server                    ClusterIP      10.0.209.79    <none>           9090/TCP                          19d
hono-m7-service-auth                         ClusterIP      10.0.93.125    <none>           5671/TCP                          19d
hono-m7-service-auth-headless                ClusterIP      None           <none>           <none>                            19d
hono-m7-service-device-connection-headless   ClusterIP      None           <none>           <none>                            19d
hono-m7-service-device-registry              ClusterIP      10.0.202.227   <none>           5671/TCP                          19d
hono-m7-service-device-registry-ext          LoadBalancer   10.0.22.132    ##IP 5##         28080:31080/TCP,28443:31443/TCP   19d
hono-m7-service-device-registry-headless     ClusterIP      None           <none>           <none>                            19d
sophokles73 commented 4 years ago

Ok this looks quite normal, apart from the high number of restarts of the Grafana pod, but anyway ... Can you post the output of

kubectl describe pod hono-m7-adapter-mqtt-vertx-... -n hono-m7

and can you please check if the output of

kubectl logs -f hono-m7-prometheus-server-... -c prometheus-server -n hono-m7

continuously produces new error messages similar to

level=error ts=2019-10-08T07:09:33.117Z caller=dns.go:155 component="discovery manager scrape" discovery=dns msg="Error refreshing DNS targets" err="could not resolve \"hono-service-device-registry-headless\": all servers responded with errors to at least one search domain"

It is ok, if the log contains such errors during the time of startup of prometheus, but once all services were up, the log messages should no longer appear ...

ghost commented 4 years ago

kubectl -n hono-m7 describe pod hono-m7-adapter-mqtt-vertx-... (sensitive data redacted)

Name:           hono-m7-adapter-mqtt-vertx-...
Namespace:      hono-m7
Priority:       0
Node:           aks-nodepool1-...
Start Time:     Wed, 18 Sep 2019 17:46:09 +0200
Labels:         app.kubernetes.io/component=adapter-mqtt-vertx
                app.kubernetes.io/instance=hono-m7
                app.kubernetes.io/managed-by=Tiller
                app.kubernetes.io/name=eclipse-hono
                app.kubernetes.io/version=1.0-M7
                helm.sh/chart=eclipse-hono-1.0-M7
                pod-template-hash=5d5d968766
Annotations:    <none>
Status:         Running
IP:             10.244.2.153
IPs:            <none>
Controlled By:  ReplicaSet/hono-m7-adapter-mqtt-vertx-...
Containers:
  eclipse-hono-adapter-mqtt-vertx:
    Container ID:   docker://...
    Image:          index.docker.io/eclipse/hono-adapter-mqtt-vertx:1.0-M7
    Image ID:       docker-pullable://eclipse/hono-adapter-mqtt-vertx@sha256:1e447eb005f439e39ebf5a2c5112fd775360ff8fa2ec91b2bdddc5307f91d562
    Ports:          8088/TCP, 1883/TCP, 8883/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Wed, 18 Sep 2019 17:46:28 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  256Mi
    Requests:
      memory:   256Mi
    Liveness:   http-get https://:8088/liveness delay=180s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:8088/readiness delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SPRING_CONFIG_LOCATION:  file:///etc/hono/
      SPRING_PROFILES_ACTIVE:  dev
      LOGGING_CONFIG:          classpath:logback-spring.xml
      _JAVA_OPTIONS:           -XX:MinRAMPercentage=80 -XX:MaxRAMPercentage=80
      KUBERNETES_NAMESPACE:    hono-m7 (v1:metadata.namespace)
      JAEGER_SERVICE_NAME:     hono-m7-adapter-mqtt-vertx
    Mounts:
      /etc/hono from conf (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-242vq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  conf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hono-m7-adapter-mqtt-vertx-conf
    Optional:    false
  default-token-242vq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-242vq
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

The Prometheus log does not contain the error message you mentioned: kubectl logs -f hono-m7-prometheus-server-... -c prometheus-server -n hono-m7

level=info ts=2019-10-08T01:00:00.948Z caller=compact.go:495 component=tsdb msg="write block" mint=1570485600000 maxt=1570492800000 ulid=01DPME2KDE1M0BGJ6QX13MANXS duration=134.69616ms
level=info ts=2019-10-08T01:00:00.957Z caller=head.go:586 component=tsdb msg="head GC completed" duration=979.195µs
level=info ts=2019-10-08T01:00:01.005Z caller=head.go:656 component=tsdb msg="WAL checkpoint complete" first=228 last=229 duration=48.201535ms
level=info ts=2019-10-08T03:00:00.943Z caller=compact.go:495 component=tsdb msg="write block" mint=1570492800000 maxt=1570500000000 ulid=01DPMMYANDPKNJ1D5H23YB53NQ duration=129.671226ms
level=info ts=2019-10-08T03:00:00.951Z caller=head.go:586 component=tsdb msg="head GC completed" duration=972.495µs
level=info ts=2019-10-08T03:00:01.121Z caller=compact.go:440 component=tsdb msg="compact blocks" count=3 mint=1570471200000 maxt=1570492800000 ulid=01DPMMYAT0SZE3AEJ2CRNPKY4R sources="[01DPM0B4XE4C7J5WYAGGZQSD2F 01DPM76W5D759MFTGGSJYG7YBR 01DPME2KDE1M0BGJ6QX13MANXS]" duration=161.096363ms
level=info ts=2019-10-08T03:00:01.308Z caller=compact.go:440 component=tsdb msg="compact blocks" count=3 mint=1570428000000 maxt=1570492800000 ulid=01DPMMYAZDS9JC0VZYPSFK60G2 sources="[01DPKBQZ9SCV08XZA95DZ6DCTD 01DPM0B55HVK9SD1X589SAAWM7 01DPMMYAT0SZE3AEJ2CRNPKY4R]" duration=175.287889ms
level=info ts=2019-10-08T05:00:01.032Z caller=compact.go:495 component=tsdb msg="write block" mint=1570500000000 maxt=1570507200000 ulid=01DPMVT1XEWJF22CCYFH6RY9KK duration=217.633726ms
level=info ts=2019-10-08T05:00:01.040Z caller=head.go:586 component=tsdb msg="head GC completed" duration=950.995µs
level=info ts=2019-10-08T05:00:01.086Z caller=head.go:656 component=tsdb msg="WAL checkpoint complete" first=230 last=231 duration=45.967851ms
level=info ts=2019-10-08T07:00:00.989Z caller=compact.go:495 component=tsdb msg="write block" mint=1570507200000 maxt=1570514400000 ulid=01DPN2NS5EZ8S2ZF849YV5J2PA duration=175.479847ms
level=info ts=2019-10-08T07:00:01.002Z caller=head.go:586 component=tsdb msg="head GC completed" duration=924.795µs
sophokles73 commented 4 years ago

Am I right in assuming that

Node:           aks-nodepool1-...

indicates that you have deployed to Amazon's Azure's managed kubernetes service (AKS) using their load balancers in front of the protocol adapters?

ghost commented 4 years ago

Bingo! We deployed to Azure Kubernetes Service and the services of type LoadBalancer have Frontend IP configurations in an Azure load balancer resource. The Azure load balancer resource has Health probes that have an interval of 5 seconds. When I modify the interval of the health probe for the TLS port provided by the MQTT adapter, the block of 4 error messages changes to the new interval.

So, yeah, the Azure load balancer health probes are the source of the log statements!

sophokles73 commented 4 years ago

Glad we found the source of the logs. However, I am afraid that I cannot help you with getting rid of them because I have no clue about configuration of load balancers on Azure. For that I would kindly refer you to the Azure AKS documentation ... Would you mind closing this issue based on the current outcome?