emqxarchive / emqx-docker

This repository is no longer maintained, please go to https://github.com/emqx/emqx-rel/tree/master/deploy/docker
Apache License 2.0
235 stars 162 forks source link

Healthcheck makes emqx exit abnormally #143

Open renatomotorline opened 4 years ago

renatomotorline commented 4 years ago

Expected behavior

The container execute the healthcheck and don't exit.

Actual behavior

Container exits abnormally approximately 40 seconds after start. The healthcheck is 2m. If I remove the healthcheck all works ok.

Docker logs

node.max_ports=1048576
listener.tcp.external.acceptors=64
listener.ssl.external.acceptors=32
node.process_limit=2097152
node.max_ets_tables=2097152
cluster.discovery=static
cluster.discovery=static
listener.ws.external.acceptors=16
node.name=emqx@emqx.backend02
cluster.static.seeds=emqx@emqx.backend01, emqx@emqx.backend02, emqx@emqx.backend03
cluster.static.seeds=emqx@emqx.backend01, emqx@emqx.backend02, emqx@emqx.backend03
EMQX_LOADED_PLUGINS="emqx_management,emqx_auth_http,emqx_recon,emqx_retainer,emqx_dashboard"

=====
===== LOGGING STARTED Fri May  8 14:28:32 UTC 2020
=====
Exec: /opt/emqx/erts-10.5.6/bin/erlexec -boot /opt/emqx/releases/v4.0.5/emqx -mode embedded -boot_var ERTS_LIB_DIR /opt/emqx/erts-10.5.6/../lib -mnesia dir "/opt/emqx/data/mnesia/emqx@emqx.backend02" -config /opt/emqx/data/configs/app.2020.05.08.14.28.34.config -args_file /opt/emqx/data/configs/vm.2020.05.08.14.28.34.args -vm_args /opt/emqx/data/configs/vm.2020.05.08.14.28.34.args -start_epmd false -epmd_module ekka_epmd -proto_dist ekka -- console
Root: /opt/emqx
/opt/emqx
Starting emqx on node emqx@emqx.backend02
Start http:management listener on 8081 successfully.
Start http:dashboard listener on 18083 successfully.
Start mqtt:tcp listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp listener on 0.0.0.0:1883 successfully.
Start mqtt:ws listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl listener on 0.0.0.0:8883 successfully.
Start mqtt:wss listener on 0.0.0.0:8084 successfully.
EMQ X Broker 4.0.5 is running now!
Eshell V10.5.6  (abort with ^G)
(emqx@emqx.backend02)1> ['2020-05-08T14:29:04Z']:emqx exit abnormally

Test Dockerfile

# Use the emqx official
FROM emqx/emqx:v4.0.5

# Go to config folder
WORKDIR /opt/emqx/etc

# Install curl for hearth check
RUN sudo apk add --no-cache curl

# Set user
USER emqx

# Copy configurations
COPY emqx.conf .
COPY plugins/* ./plugins/

# emqx will occupy these port:
# - 1883 port for MQTT
# - 8080 for mgmt API
# - 8083 for WebSocket/HTTP
# - 8084 for WSS/HTTPS
# - 8883 port for MQTT(SSL)
# - 11883 port for internal MQTT/TCP
# - 18083 for dashboard
# - 4369 for port mapping
# - 5369 for gen_rpc port mapping
# - 6369 for distributed node
EXPOSE 1883 8080 8081 8083 8084 8883 11883 18083 4369 5369 6369

HEALTHCHECK --interval=2m --timeout=3s --retries=3 \
  CMD curl -f --basic -u emqx-backend:public -k http://localhost:8081/api/v4/brokers || exit 1

EMQ version

emqx/emqx:v4.0.5

Docker version

Which docker-engine version?

docker -v
Docker version 19.03.8, build afacb8b

How docker info?

docker info
Docker version 19.03.8, build afacb8b
[root@docker1 docker-workspace]# docker info
Client:
 Debug Mode: false

Server:
 Containers: 5
  Running: 5
  Paused: 0
  Stopped: 0
 Images: 40
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: qlm1hzy33qg8h8b8oe18httvo
  Is Manager: true
  ClusterID: lod1cg5a3dmoc87v839s4i24u
  Managers: 1
  Nodes: 3
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.16.21
  Manager Addresses:
   192.168.16.21:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.18.0-147.8.1.el8_1.x86_64
 Operating System: CentOS Linux 8 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 3.692GiB
 Name: docker1.localdomain
 ID: RZZ2:XOOT:4FIB:GH4B:G6LD:6Z3K:7RDO:UMJ3:3FHV:3XC2:LEWH:NI5A
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

System

What system do you use? CentOS Linux release 8.1.1911 (Core)

Rory-Z commented 4 years ago

Hi, @renatomotorline, sorry for late reply there I suggest you change the health check address to http://127.0.0.1:8081/status and try again

renatomotorline commented 4 years ago

@zhanghongtong the problem persists. Even if I put the healthcheck command equal to exit 0.

Rory-Z commented 4 years ago

@renatomotorline Can you share the contents of log/emqx.log.* and log/erlang.log.* after the health check failed?

renatomotorline commented 4 years ago

The fail is before the health check I think because the health check has an interval of 2 minutes and the container fails after approximately 40 seconds. If I run the container works fine.

docker run emqx-backend:1.0.0

If I deploy in the docker swarm the problem happens.

docker stack deploy --with-registry-auth -c docker-compose.yml emqx-backend

The /opt/emqx/log/emqx.log.* only give �

/opt/emqx/log/erlang.log.*

cat /var/lib/docker/volumes/emqx-backend_emqx-backend01-logs/_data/erlang.log.*

=====
===== LOGGING STARTED Tue May 12 09:46:43 UTC 2020
=====
Exec: /opt/emqx/erts-10.5.6/bin/erlexec -boot /opt/emqx/releases/v4.0.5/emqx -mode embedded -boot_var ERTS_LIB_DIR /opt/emqx/erts-10.5.6/../lib -mnesia dir "/opt/emqx/data/mnesia/emqx@emqx.backend01" -config /opt/emqx/data/configs/app.2020.05.12.09.46.45.config -args_file /opt/emqx/data/configs/vm.2020.05.12.09.46.45.args -vm_args /opt/emqx/data/configs/vm.2020.05.12.09.46.45.args -start_epmd false -epmd_module ekka_epmd -proto_dist ekka -- console
Root: /opt/emqx
/opt/emqx
Starting emqx on node emqx@emqx.backend01
Start http:management listener on 8081 successfully.
Start http:dashboard listener on 18083 successfully.
Start mqtt:tcp listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp listener on 0.0.0.0:1883 successfully.
Start mqtt:ws listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl listener on 0.0.0.0:8883 successfully.
Start mqtt:wss listener on 0.0.0.0:8084 successfully.
EMQ X Broker 4.0.5 is running now!
Eshell V10.5.6  (abort with ^G)

docker-compose.yml

version: "3.3"

services:
  emqx-backend01:
    image: emqx-backend:1.0.0
    environment:
      - "EMQX_LOADED_PLUGINS=\"emqx_management,emqx_auth_http,emqx_recon,emqx_retainer,emqx_dashboard\""
      - "EMQX_NAME=emqx"
      - "EMQX_HOST=emqx.backend01"
      - "EMQX_CLUSTER__DISCOVERY=static"
      - "EMQX_CLUSTER__STATIC__SEEDS=emqx@emqx.backend01, emqx@emqx.backend02, emqx@emqx.backend03"
    volumes:
      - emqx-backend01-logs:/opt/emqx/log
    deploy:
      replicas: 1
      placement:
        constraints:
          # - node.role == worker
          - node.labels.worker == 1
      restart_policy:
        condition: none
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=local-network"
        # http with redirection
        - "traefik.http.routers.emqx-backend-dashboard.entrypoints=web"
        - "traefik.http.routers.emqx-backend-dashboard.rule=Host(`<HOSTNAME>`)"
        - "traefik.http.middlewares.redirect-middleware.redirectscheme.scheme=https"
        - "traefik.http.routers.emqx-backend-dashboard.middlewares=redirect-middleware"
        # https
        - "traefik.http.routers.emqx-backend-dashboard-secure.rule=Host(`<HOSTNAME>`)"
        - "traefik.http.routers.emqx-backend-dashboard-secure.entrypoints=websecure"
        - "traefik.http.routers.emqx-backend-dashboard-secure.tls=true"
        - "traefik.http.routers.emqx-backend-dashboard-secure.tls.certresolver=myresolver"
        - "traefik.http.routers.emqx-backend-dashboard-secure.service=service-emqx-backend-dashboard"
        - "traefik.http.services.service-emqx-backend-dashboard.loadbalancer.server.port=18083"
        # mqtts
        - "traefik.tcp.routers.emqx-backend-mqtts.entrypoints=mqtts-backend"
        - "traefik.tcp.routers.emqx-backend-mqtts.rule=HostSNI(`<HOSTNAME>`)"
        - "traefik.tcp.routers.emqx-backend-mqtts.tls=true"
        - "traefik.tcp.routers.emqx-backend-mqtts.tls.certresolver=myresolver"
        - "traefik.tcp.routers.emqx-backend-mqtts.service=service-mqtts-backend"
        - "traefik.tcp.services.service-mqtts-backend.loadbalancer.server.port=1883"
    networks:
      local-network:
        aliases:
        - emqx.backend01
        - emqx.backend

  emqx-backend02:
    image: emqx-backend:1.0.0
    environment:
      - "EMQX_LOADED_PLUGINS=\"emqx_management,emqx_auth_http,emqx_recon,emqx_retainer,emqx_dashboard\""
      - "EMQX_NAME=emqx"
      - "EMQX_HOST=emqx.backend02"
      - "EMQX_CLUSTER__DISCOVERY=static"
      - "EMQX_CLUSTER__STATIC__SEEDS=emqx@emqx.backend01, emqx@emqx.backend02, emqx@emqx.backend03"
    deploy:
      replicas: 1
      placement:
        constraints:
          # - node.role == worker
          - node.labels.worker == 2
      restart_policy:
        condition: on-failure
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=local-network"
        # http with redirection
        - "traefik.http.routers.emqx-backend-dashboard.entrypoints=web"
        - "traefik.http.routers.emqx-backend-dashboard.rule=Host(`<HOSTNAME>`)"
        - "traefik.http.middlewares.redirect-middleware.redirectscheme.scheme=https"
        - "traefik.http.routers.emqx-backend-dashboard.middlewares=redirect-middleware"
        # https
        - "traefik.http.routers.emqx-backend-dashboard-secure.rule=Host(`<HOSTNAME>`)"
        - "traefik.http.routers.emqx-backend-dashboard-secure.entrypoints=websecure"
        - "traefik.http.routers.emqx-backend-dashboard-secure.tls=true"
        - "traefik.http.routers.emqx-backend-dashboard-secure.tls.certresolver=myresolver"
        - "traefik.http.routers.emqx-backend-dashboard-secure.service=service-emqx-backend-dashboard"
        - "traefik.http.services.service-emqx-backend-dashboard.loadbalancer.server.port=18083"
        # mqtts
        - "traefik.tcp.routers.emqx-backend-mqtts.entrypoints=mqtts-backend"
        - "traefik.tcp.routers.emqx-backend-mqtts.rule=HostSNI(`<HOSTNAME>`)"
        - "traefik.tcp.routers.emqx-backend-mqtts.tls=true"
        - "traefik.tcp.routers.emqx-backend-mqtts.tls.certresolver=myresolver"
        - "traefik.tcp.routers.emqx-backend-mqtts.service=service-mqtts-backend"
        - "traefik.tcp.services.service-mqtts-backend.loadbalancer.server.port=1883"
    networks:
      local-network:
        aliases:
        - emqx.backend02
        - emqx.backend

  emqx-backend03:
    image: emqx-backend:1.0.0
    environment:
      - "EMQX_LOADED_PLUGINS=\"emqx_management,emqx_auth_http,emqx_recon,emqx_retainer,emqx_dashboard\""
      - "EMQX_NAME=emqx"
      - "EMQX_HOST=emqx.backend03"
      - "EMQX_CLUSTER__DISCOVERY=static"
      - "EMQX_CLUSTER__STATIC__SEEDS=emqx@emqx.backend01, emqx@emqx.backend02, emqx@emqx.backend03"
    deploy:
      replicas: 1
      placement:
        constraints:
          # - node.role == worker
          # - node.labels.mongo.replica == 3
          - node.labels.worker == 3
      restart_policy:
        condition: on-failure
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=local-network"
        # http with redirection
        - "traefik.http.routers.emqx-backend-dashboard.entrypoints=web"
        - "traefik.http.routers.emqx-backend-dashboard.rule=Host(`<HOSTNAME>`)"
        - "traefik.http.middlewares.redirect-middleware.redirectscheme.scheme=https"
        - "traefik.http.routers.emqx-backend-dashboard.middlewares=redirect-middleware"
        # https
        - "traefik.http.routers.emqx-backend-dashboard-secure.rule=Host(`<HOSTNAME>`)"
        - "traefik.http.routers.emqx-backend-dashboard-secure.entrypoints=websecure"
        - "traefik.http.routers.emqx-backend-dashboard-secure.tls=true"
        - "traefik.http.routers.emqx-backend-dashboard-secure.tls.certresolver=myresolver"
        - "traefik.http.routers.emqx-backend-dashboard-secure.service=service-emqx-backend-dashboard"
        - "traefik.http.services.service-emqx-backend-dashboard.loadbalancer.server.port=18083"
        # mqtts
        - "traefik.tcp.routers.emqx-backend-mqtts.entrypoints=mqtts-backend"
        - "traefik.tcp.routers.emqx-backend-mqtts.rule=HostSNI(`<HOSTNAME>`)"
        - "traefik.tcp.routers.emqx-backend-mqtts.tls=true"
        - "traefik.tcp.routers.emqx-backend-mqtts.tls.certresolver=myresolver"
        - "traefik.tcp.routers.emqx-backend-mqtts.service=service-mqtts-backend"
        - "traefik.tcp.services.service-mqtts-backend.loadbalancer.server.port=1883"
    networks:
      local-network:
        aliases:
        - emqx.backend03
        - emqx.backend

networks:
  local-network:
    external: true

volumes:
  emqx-backend01-logs:
Rory-Z commented 4 years ago

@renatomotorline Sorry, I don't have a docker swarm cluster, so I can't test the docker-compose.yaml file you gave me, but I deleted deploy from the docker-compose.yaml file and set the network to driver: bridge, then it is possible to successfully deploy the docker-compose cluster, and the health check is also passed, so it is recommended that you review your own docker swarm deployment