goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
23.89k stars 4.74k forks source link

notary-server randomly becomes unreachable #18917

Closed hexxone closed 1 year ago

hexxone commented 1 year ago
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/users/current/permissions?scope=/project/3&relative=true HTTP/1.1" 200 3064 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.003 0.003 .
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/projects/3 HTTP/1.1" 200 370 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.005 0.005 .
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/scanners?page_size=15&page=1 HTTP/1.1" 200 340 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.006 0.006 .
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/systeminfo HTTP/1.1" 200 411 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.004 0.004 .
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/projects/3/summary HTTP/1.1" 200 103 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.016 0.016 .
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/export/cve/executions HTTP/1.1" 200 13 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.007 0.007 .
Jul 11 12:27:29 172.21.0.1 proxy[1597085]: 10.120.10.10 - "GET /api/v2.0/projects/sample/repositories?page_size=15&page=1 HTTP/1.1" 200 1269 "https://docker.sample.net/harbor/projects/3/repositories" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" 0.016 0.015 .
Jul 11 12:27:48 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:28:18 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:28:48 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:29:18 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:29:48 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:30:18 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:30:48 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:31:18 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:31:49 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:32:19 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:32:49 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:33:19 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:33:49 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:34:19 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:34:49 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:35:19 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:35:49 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:36:20 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:36:50 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:37:20 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:37:50 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:38:20 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:38:50 172.21.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:41:04 172.22.0.1 proxy[1597085]: 127.0.0.1 - "GET / HTTP/1.1" 308 171 "-" "curl/8.0.1" 0.000 - .
Jul 11 12:43:30 172.25.0.1 proxy[1597085]: 2023/07/11 10:43:30 [emerg] 1#0: host not found in upstream "notary-server:4443" in /etc/nginx/conf.d/notary.upstream.conf:2
Jul 11 12:43:30 172.25.0.1 proxy[1597085]: nginx: [emerg] host not found in upstream "notary-server:4443" in /etc/nginx/conf.d/notary.upstream.conf:2
Jul 11 12:43:31 172.25.0.1 proxy[1597085]: 2023/07/11 10:43:31 [emerg] 1#0: host not found in upstream "notary-server:4443" in /etc/nginx/conf.d/notary.upstream.conf:2
Jul 11 12:43:31 172.25.0.1 proxy[1597085]: nginx: [emerg] host not found in upstream "notary-server:4443" in /etc/nginx/conf.d/notary.upstream.conf:2

Expected behavior and actual behavior:

I expect the harbor notary-server to be consistently reachable during runtime. However, the notary-server becomes unreachable while the system is running.

Steps to reproduce the problem:

The issue occurs intermittently during runtime, and exact steps to reproduce the problem are still unclear. However, the issue seems to manifest after several successful operations, as can be seen from the log files.

Versions:

Additional context:

The problem seems to be related to the notary-server becoming unreachable, which may be due to network issues or a problem with the notary-server itself. The nginx error message suggests that the host is not found in the upstream definition for the notary-server.

However, when running docker compose down and up again, the issue might sometimes be gone and instead appear in notary-server itself, saying that notary-SIGNER is unavailable:

Jul 10 16:23:34 192.168.128.1 notary-server[1381]: notaryserver database migrated to latest version
Jul 10 16:23:34 192.168.128.1 notary-server[1381]: {"level":"info","msg":"Version: 0.6.1, Git commit: d6e1431f","time":"2023-07-10T14:23:34Z"}
Jul 10 16:23:34 192.168.128.1 notary-server[1381]: {"level":"info","msg":"Using remote signing service","time":"2023-07-10T14:23:34Z"}
Jul 10 16:23:34 192.168.128.1 notary-server[1381]: {"level":"info","msg":"Using postgres backend","time":"2023-07-10T14:23:34Z"}
Jul 10 16:23:34 192.168.128.1 notary-server[1381]: {"level":"info","msg":"Starting Server","time":"2023-07-10T14:23:34Z"}
Jul 10 16:23:34 192.168.128.1 notary-server[1381]: {"level":"info","msg":"Starting on :4443","time":"2023-07-10T14:23:34Z"}
Jul 10 16:23:34 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:34 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:23:35 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:35 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:23:36 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:36 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:23:39 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:39 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:23:43 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:43 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:23:44 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:23:44Z"}
Jul 10 16:23:49 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:49 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:23:54 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:23:54Z"}
Jul 10 16:23:59 192.168.128.1 notary-server[1381]: 2023/07/10 14:23:59 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:24:04 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:24:04Z"}
Jul 10 16:24:14 192.168.128.1 notary-server[1381]: 2023/07/10 14:24:14 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:24:14 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:24:14Z"}
Jul 10 16:24:24 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:24:24Z"}
Jul 10 16:24:34 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:24:34Z"}
Jul 10 16:24:41 192.168.128.1 notary-server[1381]: 2023/07/10 14:24:41 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:24:44 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:24:44Z"}
Jul 10 16:24:54 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:24:54Z"}
Jul 10 16:25:04 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:25:04Z"}
Jul 10 16:25:14 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:25:14Z"}
Jul 10 16:25:21 192.168.128.1 notary-server[1381]: 2023/07/10 14:25:21 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp: lookup notarysigner on 127.0.0.11:53: no such host"; Reconnecting to {notarysigner:7899 <nil>}
Jul 10 16:25:24 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:25:24Z"}
Jul 10 16:25:34 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:25:34Z"}
Jul 10 16:25:44 192.168.128.1 notary-server[1381]: {"level":"error","msg":"Trust not fully operational: rpc error: code = 14 desc = grpc: the connection is unavailable","time":"2023-07-10T14:25:44Z"}

....

For installing, I have followed the documented procedure and used /install.sh --with-notary ---with-trivy.

The problem is, because we are already using an "Traefik" reverse proxy for handling SSL, and cannot expose the web server ports, we manually had to modify our docker-compose from the generated one.

Running the install or prepare script afterwards will destroy our customized config.

The weird thing is, the registry was working at first, and then suddenly started acting up as you can see in the logs.

It was already possible to push and pull different images, everything was fine.

version: '2.3'
services:
  # log:
  #   image: goharbor/harbor-log:$HARBOR_VERSION
  #   container_name: harbor-log
  #   restart: always
  #   cap_drop:
  #     - ALL
  #   cap_add:
  #     - CHOWN
  #     - DAC_OVERRIDE
  #     - SETGID
  #     - SETUID
  #   volumes:
  #     - ./log/:/var/log/docker/:z
  #     - type: bind
  #       source: ./common/config/log/logrotate.conf
  #       target: /etc/logrotate.d/logrotate.conf
  #     - type: bind
  #       source: ./common/config/log/rsyslog_docker.conf
  #       target: /etc/rsyslog.d/rsyslog_docker.conf
  #   ports:
  #     - 127.0.0.1:1514:10514
  #   networks:
  #     - harbor
  registry:
    image: goharbor/registry-photon:$HARBOR_VERSION
    container_name: registry
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    volumes:
      - ./data/registry:/storage:z
      - ./common/config/registry/:/etc/registry/:z
      - type: bind
        source: ./data/secret/registry/root.crt
        target: /etc/registry/root.crt
      - type: bind
        source: ./common/config/shared/trust-certificates
        target: /harbor_cust_cert
    networks:
      - harbor
    # depends_on:
    #   - log
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "registry"
  registryctl:
    image: goharbor/harbor-registryctl:$HARBOR_VERSION
    container_name: registryctl
    env_file:
      - ./common/config/registryctl/env
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    volumes:
      - ./data/registry:/storage:z
      - ./common/config/registry/:/etc/registry/:z
      - type: bind
        source: ./common/config/registryctl/config.yml
        target: /etc/registryctl/config.yml
      - type: bind
        source: ./common/config/shared/trust-certificates
        target: /harbor_cust_cert
    networks:
      - harbor
    # depends_on:
    #   - log
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "registryctl"
  postgresql:
    image: goharbor/harbor-db:$HARBOR_VERSION
    container_name: harbor-db
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - DAC_OVERRIDE
      - SETGID
      - SETUID
    volumes:
      - ./data/database:/var/lib/postgresql/data:z
    networks:
      harbor:
    env_file:
      - ./common/config/db/env
    # depends_on:
    #   - log
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "postgresql"
    shm_size: '1gb'
  core:
    image: goharbor/harbor-core:$HARBOR_VERSION
    container_name: harbor-core
    env_file:
      - ./common/config/core/env
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - SETGID
      - SETUID
    volumes:
      - ./data/ca_download/:/etc/core/ca/:z
      - ./data/:/data/:z
      - ./common/config/core/certificates/:/etc/core/certificates/:z
      - type: bind
        source: ./common/config/core/app.conf
        target: /etc/core/app.conf
      - type: bind
        source: ./data/secret/core/private_key.pem
        target: /etc/core/private_key.pem
      - type: bind
        source: ./data/secret/keys/secretkey
        target: /etc/core/key
      - type: bind
        source: ./common/config/shared/trust-certificates
        target: /harbor_cust_cert
    networks:
      harbor:
    depends_on:
      # - log
      - registry
      - redis
      - postgresql
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "core"
  portal:
    image: goharbor/harbor-portal:$HARBOR_VERSION
    container_name: harbor-portal
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
      - NET_BIND_SERVICE
    volumes:
      - type: bind
        source: ./common/config/portal/nginx.conf
        target: /etc/nginx/nginx.conf
    networks:
      - harbor
    # depends_on:
    #   - log
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "portal"

  jobservice:
    image: goharbor/harbor-jobservice:$HARBOR_VERSION
    container_name: harbor-jobservice
    env_file:
      - ./common/config/jobservice/env
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    volumes:
      - ./data/job_logs:/var/log/jobs:z
      - type: bind
        source: ./common/config/jobservice/config.yml
        target: /etc/jobservice/config.yml
      - type: bind
        source: ./common/config/shared/trust-certificates
        target: /harbor_cust_cert
    networks:
      - harbor
    depends_on:
      - core
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "jobservice"
  redis:
    image: goharbor/redis-photon:$HARBOR_VERSION
    container_name: redis
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    volumes:
      - ./data/redis:/var/lib/redis
    networks:
      harbor:
    # depends_on:
    #   - log
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "redis"
  proxy:
    image: goharbor/nginx-photon:$HARBOR_VERSION
    container_name: nginx
    restart: always
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
      - NET_BIND_SERVICE
    volumes:
      - ./common/config/nginx:/etc/nginx:z
      - type: bind
        source: ./common/config/shared/trust-certificates
        target: /harbor_cust_cert
    networks:
      - harbor
      - web
    # ports:
    #   - 80:8080
    labels:
      - "traefik.enable=true"
      # router 80 -> redirect
      - "traefik.http.routers.harbor-unsecure.entrypoints=web"
      - "traefik.http.routers.harbor-unsecure.rule=Host(`docker.${COMPANY_HOST}`)"
      - "traefik.http.routers.harbor-unsecure.middlewares=httpsredirect@docker"
      # router 443 -> http:8080
      - "traefik.http.routers.harbor.entrypoints=websecure"
      - "traefik.http.routers.harbor.rule=Host(`docker.${COMPANY_HOST}`)"
      - "traefik.http.routers.harbor.tls=true"
      - "traefik.http.routers.harbor.tls.certresolver=cfresolver"
      - "traefik.http.routers.harbor.tls.domains[0].main=${COMPANY_HOST}"
      - "traefik.http.routers.harbor.tls.domains[0].sans=*.${COMPANY_HOST}"
      - "traefik.http.routers.harbor.middlewares=error-pages@docker"
      - "traefik.http.routers.harbor.service=harborsecure"
      - "traefik.docker.network=web"
      # map service 
      - "traefik.http.services.harborsecure.loadbalancer.server.port=8080"
      - "traefik.http.services.harborsecure.loadbalancer.passhostheader=true"
    depends_on:
      - registry
      - core
      - portal
      # - log
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "proxy"
  trivy-adapter:
    container_name: trivy-adapter
    image: goharbor/trivy-adapter-photon:$HARBOR_VERSION
    restart: always
    cap_drop:
      - ALL
    depends_on:
      # - log
      - redis
    networks:
      - harbor
    volumes:
      - type: bind
        source: ./data/trivy-adapter/trivy
        target: /home/scanner/.cache/trivy
      - type: bind
        source: ./data/trivy-adapter/reports
        target: /home/scanner/.cache/reports
      - type: bind
        source: ./common/config/shared/trust-certificates
        target: /harbor_cust_cert
    # logging:
    #   driver: "syslog"
    #   options:
    #     syslog-address: "tcp://localhost:1514"
    #     tag: "trivy-adapter"
    env_file:
      ./common/config/trivy-adapter/env
networks:
  harbor:
   # external: false
  # IMPORTANT for Traefik
  web:
    external: true
hexxone commented 1 year ago

Update:

after throwing away everything and running install.sh --with-trivy (without notary), and only doing the bare minimum of changes in docker-compose, it is still broken.

I have now spent approximately 8h trying different things, starting from scratch, etc.

Problem stays the same. Some containers will ALWAYS fail to communicate.

Even though they are in the same network, have the correct hostnames and when using docker compose run in every container manually, it also works.

Just when upping the whole stack these issues seem to appear.

I dont understand why you are choosing to do the major configuration work with a intransparent and unclear install script, instead of documenting the procedure in detail. This is a major headache...

hexxone commented 1 year ago

The issue was probably caused by the "web" network, because docker seems to only attach a single network on startup, and the choice appears to be random.

So when the proxy get's create, it may only get the "web" network attached at first. If that happens, nginx is unable to find the upstream service and immediately crashes the container.

Removing the network fixed the issue. Now we just add the harbor network to the traefik container instead.

But perhaps it would be a good practice to just let nginx display a 500 error and retry instead of crashing.

Maybe like this:

location / {
  proxy_pass http://portal/;
  proxy_next_upstream off;  # Add this line
  ...
}

location /c/ {
  proxy_pass http://core/c/;
  proxy_next_upstream off;  # Add this line
  ...
}
...

With this change, Nginx should display its default 502 or 504 error page if it can't reach the upstream server...