docker-flow / docker-flow-proxy

Docker Flow Proxy
https://docker-flow.github.io/docker-flow-proxy/
MIT License
317 stars 189 forks source link

dfp multiple services #11

Closed RaymondMouthaan closed 6 years ago

RaymondMouthaan commented 6 years ago

Hi @vfarcic & (@thomasjpfan)

In response of our Skype session, I am submitting this issue, possibly related to #6

Please consider the following compose file, which creates two services mqtt_emqtt-master and mqtt_emqtt-worker. Each service deploys 1 container.

################################################################################
# MQTT Stack
################################################################################
#$ docker stack deploy mqtt --compose-file docker-compose-mqtt.yml
################################################################################
version: "3.6"

services:
  emqtt-master:
    image: raymondmm/emqtt
    hostname: emqtt-master
    environment:
      - "EMQ_NAME=emq"
      - "EMQ_HOST=master.mq.tt"
      - "EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492"
      - "EMQ_WAIT_TIME=60"
    networks:
      indonesia-net:
        aliases:
          - master.mq.tt
      proxy_indonesia-net:
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/TZ:ro
    deploy:
      placement:
        constraints: [node.role == manager]
      labels:
        - com.df.notify=true

        # EMQTT dashboard
        - com.df.serviceDomain.1=emqtt.indonesia
        - com.df.port.1=18083
        - com.df.reqMode.1=http

        # EMQTT tcp connection
        - com.df.port.2=1883
        - com.df.srcPort.2=1883
        - com.df.reqMode.2=tcp

  emqtt-worker:
    image: raymondmm/emqtt
    hostname: emqtt-worker
    environment:
      - "EMQ_JOIN_CLUSTER=emq@master.mq.tt"
      - "EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492"
      - "EMQ_WAIT_TIME=60"
    depends_on:
     - emqtt-master
    networks:
      - indonesia-net
      - proxy_indonesia-net
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/TZ:ro
    deploy:
      placement:
        constraints: [node.role == worker]
      labels:
        - com.df.notify=true

        # EMQTT tcp connection
        - com.df.port.1=1883
        - com.df.srcPort.1=1883
        - com.df.reqMode.1=tcp

networks:
  indonesia-net:
    external: false
  proxy_indonesia-net:
    external: true

The haproxy.cfg looks like this:

global
    pidfile /var/run/haproxy.pid
    tune.ssl.default-dh-param 2048

    # disable sslv3, prefer modern ciphers
    ssl-default-bind-options no-sslv3
    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS

    ssl-default-server-options no-sslv3
    ssl-default-server-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS

resolvers docker
    nameserver dns 127.0.0.11:53

defaults
    mode    http
    balance roundrobin

    option  dontlognull
    option  dontlog-normal
    option  http-keep-alive
    option  forwardfor
    option  redispatch

    errorfile 400 /errorfiles/400.http
    errorfile 403 /errorfiles/403.http
    errorfile 405 /errorfiles/405.http
    errorfile 408 /errorfiles/408.http
    errorfile 429 /errorfiles/429.http
    errorfile 500 /errorfiles/500.http
    errorfile 502 /errorfiles/502.http
    errorfile 503 /errorfiles/503.http
    errorfile 504 /errorfiles/504.http

    maxconn 5000
    timeout connect 5s
    timeout client  20s
    timeout server  20s
    timeout queue   30s
    timeout tunnel  3600s
    timeout http-request 5s
    timeout http-keep-alive 15s

    stats enable
    stats refresh 30s
    stats realm Strictly\ Private
    stats uri /admin?stats
    stats auth my-user:my-pass
frontend services
    bind *:80
    bind *:443
    mode http

    acl url_mqtt_emqtt-master18083_1 path_beg /
    acl domain_mqtt_emqtt-master18083_1 hdr_beg(host) -i emqtt.indonesia
    use_backend mqtt_emqtt-master-be18083_1 if url_mqtt_emqtt-master18083_1 domain_mqtt_emqtt-master18083_1

frontend tcpFE_1883
    bind *:1883
    mode tcp
    default_backend mqtt_emqtt-master-be1883_2
    default_backend mqtt_emqtt-worker-be1883_1

backend mqtt_emqtt-master-be18083_1
    mode http
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    server mqtt_emqtt-master mqtt_emqtt-master:18083

backend mqtt_emqtt-master-be1883_2
    mode tcp
    server mqtt_emqtt-master mqtt_emqtt-master:1883

backend mqtt_emqtt-worker-be1883_1
    mode tcp
    server mqtt_emqtt-worker mqtt_emqtt-worker:1883

Unfortunately as I understood dfp currently doesn't support load balancing between two (or more) services.

In this case all incoming mqtt tcp connections are getting connection with the worker and none with the master. When the worker is shutdown (replica set to zero), all the incoming mqtt tcp connections fail and not redirected to the master.

What I would like is that dfp load balances the incoming mqtt tcp connection over the two (or more) containers.

Find below an haproxy.cfg example which worked for me before i used dfp:

global
  ulimit-n 99999
  maxconn 99999
  maxpipes 99999
  tune.maxaccept 500
  log 127.0.0.1 local0
  log 127.0.0.1 local1 notice
  chroot /var/lib/haproxy
  user haproxy
  group haproxy

defaults
  log global
  mode http
  option dontlognull
  timeout connect 5000ms
  timeout client 50000ms
  timeout server 50000ms
  errorfile 400 /etc/haproxy/errors/400.http
  errorfile 403 /etc/haproxy/errors/403.http
  errorfile 408 /etc/haproxy/errors/408.http
  errorfile 500 /etc/haproxy/errors/500.http
  errorfile 502 /etc/haproxy/errors/502.http
  errorfile 503 /etc/haproxy/errors/503.http
  errorfile 504 /etc/haproxy/errors/504.http

listen stats :80
  stats enable
  stats uri / # must be present to see the logs
  stats auth admin:admin

listen mqtt
  bind *:1883
  bind *:8883 ssl crt /certs/lelylan-mqtt.pem
  mode tcp
  #Use this to avoid the connection loss when client subscribed for a topic and its idle for sometime
  option clitcpka # For TCP keep-alive
  timeout client 3h #By default TCP keep-alive interval is 2hours in OS kernal, 'cat /proc/sys/net/ipv4/tcp_keepalive_time'
  timeout server 3h #By default TCP keep-alive interval is 2hours in OS kernal
  option tcplog
  balance leastconn
  server mosca_1 178.62.122.204:1883 check
  server mosca_2 178.62.104.172:1883 check

Hope you guys have time to look into this :-)

thomasjpfan commented 6 years ago

I will look into this.

RaymondMouthaan commented 6 years ago

Thanks Thomas, if you need assistance, just let me know!

thomasjpfan commented 6 years ago

Here is a possible interface for this feature:

services:
  emqtt-master:
    deploy:
      labels:
        ...
        - com.df.reqMode.2=tcp
        - com.df.serviceGroup.2=emqtt-group
        - com.df.tcpCheckPort.2=1881
        - com.df.port.2=1883
        - com.df.srcPort.2=1883
  emqtt-worker:
    deploy:
      labels:
        ...
        - com.df.reqMode.1=tcp
        - com.df.serviceGroup.1=emqtt-group
        - com.df.tcpCheckPort.1=1880
        - com.df.port.1=1883
        - com.df.srcPort.1=1883

This places both services on the same group for the frontend with the name: emqtt-group. Setting com.df.tcpCheckPort will enable tcp checking on that port, i.e. the haproxy config will look something like this:

frontend tcp_emqtt-group_1883:
    bind *:1883
    mode tcp
    default_backend emqtt-group-be1883

backend emqtt-group-be1883:
    mode tcp
    server emqtt-master emqtt-master:1883 check port 1881
    server emqtt-worker emqtt-worker:1883 check port 1880

Setting com.df.port will set the public port for emqtt-group, if emqtt-master and emqtt-worker have conflicting com.df.ports then the listening port for the emqtt-group frontend can be either one. The user would need to know to set these two ports the same.

What do you think of this interface?

RaymondMouthaan commented 6 years ago

I like the idea about grouping the services. I only wonder how that goes when one has replica set to more then 1? (which is not the case in my emqtt set-up btw).

Wouldn't it be more easier to let docker decide the com.df.port so that the exposed ports are not fixed?

Example haproxy.cfg:

  server mosca_1 178.62.122.204:1883 check
  server mosca_2 178.62.104.172:1883 check

Your proposal:

    server emqtt-master emqtt-master:1883 check port 1881
    server emqtt-worker emqtt-worker:1883 check port 1880

You explicit have those checks on the public port, but in the example they are left out. Also in the example there are some other settings:

#Use this to avoid the connection loss when client subscribed for a topic and its idle for sometime
  option clitcpka # For TCP keep-alive
  timeout client 3h #By default TCP keep-alive interval is 2hours in OS kernal, 'cat /proc/sys/net/ipv4/tcp_keepalive_time'
  timeout server 3h #By default TCP keep-alive interval is 2hours in OS kernal
  option tcplog
  balance leastconn

Just to like web applications, a mqtt client connects to mymqtt.server.com on 1883, which is haproxy. haproxy should balance between the master or worker. If one of them is down then the client should be re-directed to the mqtt broker that is still alive.

If a test version of this feature is available then I can test it out in a real life situation.

Thanks for your effort 👍

thomasjpfan commented 6 years ago
  1. Maybe it would be better to just have com.df.tcpCheck=true to enable tcp checking. Please see the example below.

  2. There is a way to configuring DFSL to send the IPs of the services so we can use the IPs for the backend. This would handle the case where replicas >= 1. If this is a desirable feature, I can add this feature into DFP.

  3. For the complete config, I propose this compose file:

services:
  emqtt-master:
    deploy:
      labels:
        ...
        - com.df.reqMode.2=tcp
        - com.df.serviceGroup.2=emqtt-group
        - com.df.tcpCheck.2=true
        - com.df.port.2=1883
        - com.df.srcPort.2=1883
        - com.df.balance.2=leastconn
        - com.df.timeoutServer.2=10800
        - com.df.timeoutClient.2=10800
        - com.df.clitcpka.2=true
  emqtt-worker:
    deploy:
      labels:
        ...
        - com.df.reqMode.1=tcp
        - com.df.serviceGroup.1=emqtt-group
        - com.df.tcpCheck.1=true
        - com.df.port.1=1883
        - com.df.srcPort.1=1883
        - com.df.balance.1=leastconn
        - com.df.timeoutServer.1=10800
        - com.df.timeoutClient.1=10800
        - com.df.clitcpka.1=true

which will result in:

frontend tcp_emqtt-group_1883:
    bind *:1883
    mode tcp
    option tcplog
    option clitcpka
    timeout client 10800s
    default_backend emqtt-group-be1883

backend emqtt-group-be1883:
    mode tcp
    timeout server 10800s
    balance leastconn
    server emqtt-master emqtt-master:1883 check
    server emqtt-worker emqtt-worker:1883 check

This should work for your use case with replicas == 1. Does this work for you?

RaymondMouthaan commented 6 years ago

@thomasjpfan I think your proposal is exactly what i need and possible also solves #6. I can't wait to test if it works.

1.  Maybe it would be better to just have com.df.tcpCheck=true to enable tcp checking. Please see the example below.
2. There is a way to configuring DFSL to send the IPs of the services so we can use the IPs for the backend. This would handle the case where replicas >= 1. If this is a desirable feature, I can add this feature into DFP.
3. For the complete config, I propose this compose file:
RaymondMouthaan commented 6 years ago

@thomasjpfan, just curious, are there any update on this? 😄

thomasjpfan commented 6 years ago

I made some changes to the original proposal. To use this feature for your use case the stack file would be:

services:
  proxy:
    image: dockerflow/docker-flow-proxy
    environment:
      - DEBUG=true
  emqtt-master:
    deploy:
      labels:
        ...
        - com.df.reqMode.2=tcp
        - com.df.serviceGroup.2=emqtt-group
        - com.df.checkTcp.2=true
        - com.df.port.2=1883
        - com.df.srcPort.2=1883
        - com.df.balanceGroup.2=leastconn
        - com.df.timeoutServer.2=10800
        - com.df.timeoutClient.2=10800
        - com.df.clitcpka.2=true
  emqtt-worker:
    deploy:
      labels:
        ...
        - com.df.reqMode.1=tcp
        - com.df.serviceGroup.1=emqtt-group
        - com.df.tcpCheck.1=true
        - com.df.port.1=1883
        - com.df.srcPort.1=1883
        - com.df.balanceGroup.1=leastconn
        - com.df.timeoutServer.1=10800
        - com.df.timeoutClient.1=10800
        - com.df.clitcpka.1=true

This would result in this configuration:

listen tcp_emqtt-group_1883:
    bind *:1883
    mode tcp
    option tcplog
    log global
    option clitcpka
    option tcp-check
    timeout client 10800s
    timeout server 10800s
    balance leastconn
    server mqtt_emqtt-master_1883_0 *EMQTT-MASTER_IP_REPLICA_1*:1883 check
    server mqtt_emqtt-worker_1883_0  *EMQTT-WORKER_IP_REPLICA_1*:1883 check

The DEBUG=true flag in the environment enables option tcplog and log global.

You can test out the TCP groups feature by using dockerflow/docker-flow-proxy:18.05.08-45 or later.

RaymondMouthaan commented 6 years ago

@thomasjpfan, I just tested the above stack file and it works exactly how I hoped it would be! 💯.

I have tested the following scenario:

  1. both services emqtt_master and emqtt_worker up and running with replica set to 1.
  2. checked the emqtt dashboard and verified all connections from the clients are load balanced equally over the to emqtt brokers 💯.
  3. Then from the worker service, I set the replica to zero.
  4. Master took all connections, balanced by HAProxy 💯.
  5. Then I set back the worker service to replica 1.
  6. All client connections remain active to the master service. 💯.
  7. Then from the master service, I set the replica to zero.
  8. All connections moved to the worker service 💯.
  9. Then I set back the master service to replica 1.
  10. All client connections remain active to the worker service. 💯.
  11. Updating DFP service, made all connection lost and reconnect equally as in step 1. 💯

So basically I works perfectly and I am really happy you implemented this feature into DFP, thank you 🥇 🥇 🥇.

note: I noticed an issue #18 with the docker image tag for linux-arm and I'll investigate whats going wrong and do a PR for it.