c0sogi / LLMChat

A full-stack Webui implementation of Large Language model, such as ChatGPT or LLaMA.
MIT License
245 stars 40 forks source link

Fails on Docker swarm #32

Closed Torhamilton closed 1 year ago

Torhamilton commented 1 year ago

"WARNING You Probably Don't Need this Docker Image: " we should follow that advice and remove Gunicorn and the requirement for forwarded-ip. You can't used fixed ip for traefik in swarm mode or Kubernetes. Also better to let swarm manage replication instead of Gunicorn workers

c0sogi commented 1 year ago

If you use Traefik to route to an API container without a forwarded ip, the IP of the accessor will not be logged. Is this okay?

Torhamilton commented 1 year ago

Local traefic network are usually trusted . We leave security for fail2ban and other traefik plugins. This is critical issue

c0sogi commented 1 year ago

I'm not familiar with Kubernetes and Docker swarms, but doesn't Traefik need to pre-set a trusted IP in order to reference the X-Forwarded-For header? It seems like this would be necessary to prevent spoofing when routing through multiple proxies and the accessor's original IP is passed in the header.

Torhamilton commented 1 year ago

Traefik has this covered for swarm, compose or k8s. All internal traefik can be trusted. We use forwardedHeaders.trustedIPs

Since Traefik is the edge router within the Docker Compose network, and it's the first point of contact for incoming requests, you can trust all IPs within the Docker Compose network (because they are all internal to the network and under your control). This means that you could potentially specify the entire subnet of the Docker Compose network as trusted in the Traefik configuration.

However, note that the X-Forwarded-* and Forwarded headers are typically used in situations where there are multiple hops between the client and the server (for example, client -> load balancer -> proxy -> server), so if Traefik is the only hop between the client and the server, you might not need to use these headers at all.

c0sogi commented 1 year ago

Do we really need to remove Gunicorn? There may be cases where we deploy without using Kubernetes, and since Gunicorn is responsible for distributing the workers, it seems unlikely that using Kubernetes will cause any problems.

Torhamilton commented 1 year ago

I believe compose supports replication too. Also I noticed Gunicorn makes the image very large. However we can leave it for now.

c0sogi commented 1 year ago

I didn't remove Gunicorn, but I did remove the static IP dependency for production compose in the new commit. Please leave a comment if you have anything to add.

e2d3b92feb0d316e25576dfc9070177f13746806

Torhamilton commented 1 year ago

I get gateway timeout, error 504. It worked only twice but fails on docker-compose down/up. We need to guarantee connection will always be accepted.

c0sogi commented 1 year ago

This is caused by Traefik not catching the reverse proxy because we added a private network. it appears that the problem is only with prod compose. First, please add "traefik.docker.network=reverse-proxy-public" to the labels of your api. For example, you can define it as follows. I'll fix it in the next commit.

  api:
    image: cosogi/llmchat:230610
    restart: always
    env_file:
      - .env
    command:
      - "--host"
      - "0.0.0.0"
      - "--port"
      - "8000"
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=reverse-proxy-public"
      - "traefik.http.routers.api.rule=HostRegexp(`${HOST_MAIN}`, `{subdomain:[a-z]+}.${HOST_MAIN}`, `${HOST_IP}`)"
      - "traefik.http.routers.api.entrypoints=websecure"
      - "traefik.http.services.api.loadbalancer.server.scheme=http"
      - "traefik.http.services.api.loadbalancer.server.port=8000"
      - "traefik.http.routers.api.tls=true"
      - "traefik.http.routers.api.tls.certresolver=myresolver"
      - "traefik.http.routers.api.tls.domains[0].main=${HOST_MAIN}"
      - "traefik.http.routers.api.tls.domains[0].sans=${HOST_SUB}"
    depends_on:
      - proxy
      - db
      - cache
      - vectorstore
    volumes:
      - .:/app
    networks:
      - api-private
      - reverse-proxy-public
c0sogi commented 1 year ago

If you're still experiencing issues, delete api-private. There seems to be a bug in Traefik if you are using multiple networks.

version: '3.9'

volumes:
  mysql:
  redis:
  qdrant:

networks:
  reverse-proxy-public:
    driver: bridge
    ipam:
      driver: default

services:
  proxy:
    image: traefik
    command:
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.mysql.address=:3306"
      - "--entrypoints.redis.address=:6379"
      - "--entryPoints.web.http.redirections.entryPoint.to=websecure"
      - "--entryPoints.web.http.redirections.entryPoint.scheme=https"
      - "--providers.docker"
      - "--providers.docker.exposedbydefault=false"
      - "--api.insecure=false"
      - "--certificatesresolvers.myresolver.acme.tlschallenge=true"
      - "--certificatesresolvers.myresolver.acme.email=${MY_EMAIL}"
      - "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
      # - "--log.level=DEBUG"
      # - "--certificatesresolvers.myresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
    ports:
      - "80:80"
      - "443:443"
      - "3306:3306"
      - "6379:6379"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./letsencrypt:/letsencrypt
    networks:
      - reverse-proxy-public

  db:
    image: mysql
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: "${MYSQL_ROOT_PASSWORD}"
      MYSQL_ROOT_HOST: "%"
      MYSQL_DATABASE: "${MYSQL_DATABASE}"
      MYSQL_USER: "${MYSQL_USER}"
      MYSQL_PASSWORD: "${MYSQL_PASSWORD}"
      TZ: "Asia/Seoul"
    volumes:
      - mysql:/var/lib/mysql
      # - ./my.cnf:/etc/mysql/conf.d/my.cnf
    labels:
      - "traefik.enable=true"
      - "traefik.tcp.routers.db.rule=HostSNI(`*`)"
      - "traefik.tcp.services.db.loadbalancer.server.port=3306"
      - "traefik.tcp.routers.db.entrypoints=mysql"
    networks:
      - reverse-proxy-public

  cache:
    image: redis/redis-stack-server:latest
    restart: always
    environment:
      - REDIS_ARGS=--requirepass ${REDIS_PASSWORD} --maxmemory 100mb --maxmemory-policy allkeys-lru --appendonly yes
    volumes:
      - redis:/data
    labels:
      - "traefik.enable=true"
      - "traefik.tcp.routers.cache.rule=HostSNI(`*`)"
      - "traefik.tcp.services.cache.loadbalancer.server.port=6379"
      - "traefik.tcp.routers.cache.entrypoints=redis"
    networks:
      - reverse-proxy-public

  api:
    image: cosogi/llmchat:230610
    restart: always
    env_file:
      - .env
    command:
      - "--host"
      - "0.0.0.0"
      - "--port"
      - "8000"
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=reverse-proxy-public"
      - "traefik.http.routers.api.rule=HostRegexp(`${HOST_MAIN}`, `{subdomain:[a-z]+}.${HOST_MAIN}`, `${HOST_IP}`)"
      - "traefik.http.routers.api.entrypoints=websecure"
      - "traefik.http.services.api.loadbalancer.server.scheme=http"
      - "traefik.http.services.api.loadbalancer.server.port=8000"
      - "traefik.http.routers.api.tls=true"
      - "traefik.http.routers.api.tls.certresolver=myresolver"
      - "traefik.http.routers.api.tls.domains[0].main=${HOST_MAIN}"
      - "traefik.http.routers.api.tls.domains[0].sans=${HOST_SUB}"
    depends_on:
      - proxy
      - db
      - cache
      - vectorstore
    volumes:
      - .:/app
    networks:
      - reverse-proxy-public

  vectorstore:
    image: qdrant/qdrant:latest
    restart: always
    volumes:
      - qdrant:/qdrant/storage
    networks:
      - reverse-proxy-public
Torhamilton commented 1 year ago

Thanks. I will update compose

Torhamilton commented 1 year ago

Works.