SUSE / Portus

Authorization service and frontend for Docker registry (v2)
http://port.us.org/
Apache License 2.0
3k stars 471 forks source link

502 Bad gateway / panic: runtime error: invalid memory address or nil pointer dereference #2248

Closed kinland closed 4 years ago

kinland commented 4 years ago

Description

After our Portus docker image has been up for ~1 month, we start getting 502 Bad Gateway errors when we try to access Portus via a web browser.

If we run docker logs <portus_container_id>, the last event is a pull request for an image... no errors.

We're unfortunately unable to set PORTUS_LOG_LEVEL to debug because Docker won't let you change an environment variable on a running container (quite an oversight). We're going to do that when we restart the container, so the next time this occurs, we'll have more information.

We did, however, find that we could get an error to show up using portusctrl...

Steps to reproduce

Unfortunately the person who set up our Portus install has left the company, but the workaround that he provided was just restart it (docker-compose restart).

  1. Spin up Portus... wait 3+ weeks.
  2. Attempting to access the Portus website results in 502 Bad Gateway
  3. SSH into our Amazon EC2 instance that is running nginx/portus/etc
  4. docker exec <container> portusctl -u=<user> -t=<token> -s=0.0.0.0:3000 health
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x55779c01899f]

goroutine 1 [running]:
main.buildRequest(0x55779c0229d7, 0x3, 0xc42001c4c0, 0xe, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42001c4c0, ...)
        /home/abuild/rpmbuild/BUILD/portusctl-master/http.go:58 +0x5f
main.request(0x55779c0229d7, 0x3, 0xc42001c4c0, 0xe, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42004d680, ...)
        /home/abuild/rpmbuild/BUILD/portusctl-master/http.go:69 +0xaa
main.healthAction(0xc42006e9a0, 0x0, 0xc42006e9a0)
        /home/abuild/rpmbuild/BUILD/portusctl-master/health.go:35 +0x17c
gopkg.in/urfave/cli%2ev1.HandleAction(0x55779c2f0cc0, 0x55779c35b2c0, 0xc42006e9a0, 0xc420054100, 0x0)
        /home/abuild/go/src/gopkg.in/urfave/cli.v1/app.go:490 +0xd4
gopkg.in/urfave/cli%2ev1.Command.Run(0x55779c0231ec, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x55779c029c9a, 0x1b, 0x0, ...)
        /home/abuild/go/src/gopkg.in/urfave/cli.v1/command.go:210 +0xa97
gopkg.in/urfave/cli%2ev1.(*App).Run(0xc420062820, 0xc420010140, 0x5, 0x5, 0x0, 0x0)
        /home/abuild/go/src/gopkg.in/urfave/cli.v1/app.go:255 +0x6fa
gopkg.in/urfave/cli%2ev1.(*App).RunAndExitOnError(0xc420062820)
        /home/abuild/go/src/gopkg.in/urfave/cli.v1/app.go:276 +0x55
main.main()
        /home/abuild/rpmbuild/BUILD/portusctl-master/portusctl.go:194 +0xe61

Output of our nginx container logs looks like this:

<ip> - - [06/Nov/2019:19:13:10 +0000] "GET /favicon.ico HTTP/2.0" 502 552 "https://<our_portus_subdomain_name>/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36"
2019/11/06 19:25:38 [error] 6#6: *8684 connect() failed (113: Host is unreachable) while connecting to upstream, client: <ip>, server: <our_portus_subdomain_name>, request: "GET / HTTP/2.0", upstream: "https://<portus_docker_container_IP>:3000/", host: "<our_portus_subdomain_name>"
<ip> - - [06/Nov/2019:19:25:38 +0000] "GET / HTTP/2.0" 502 552 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36"
2019/11/06 19:25:41 [error] 6#6: *8684 connect() failed (113: Host is unreachable) while connecting to upstream, client: <ip>, server: <our_portus_subdomain_name>, request: "GET /favicon.ico HTTP/2.0", upstream: "https://<portus_docker_container_IP>:3000/favicon.ico", host: "<our_portus_subdomain_name>", referrer: "https://<our_portus_subdomain_name>/"
<ip> - - [06/Nov/2019:19:25:41 +0000] "GET /favicon.ico HTTP/2.0" 502 552 "https://<our_portus_subdomain_name>/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36"

Deployment information

Deployment method: We have Portus hosted on an AWS EC2 instance. Output of docker ps looks like so:

CONTAINER ID IMAGE COMMAND PORTS NAMES CREATED STATUS
b7bb7557bcf4 nginx:alpine "nginx -g 'daemon off;'" 6 weeks ago Up 3 weeks 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp portus_nginx_1
5035ec47f25b opensuse/portus:head "/init" 6 weeks ago Up 3 weeks 3000/tcp portus_background_1
d9d7f7f8ca0f registry:2.6 "/entrypoint.sh /bin/sh /etc/docker/registry/init" 6 weeks ago Up 3 weeks 0.0.0.0:5000-5001->5000-5001/tcp portus_registry_1
337b399fcd28 mariadb:10.0.23 "/docker-entrypoint.sh mysqld --character-set-server=utf8 --collation-server=utf8_unicode_ci '--init-connect=SET NAMES UTF8;' --innodb-flush-log-at-trx-commit=0" 6 weeks ago Up 3 weeks 3306/tcp portus_db_1

Configuration:

ubuntu@<ec2-ip-address>:~/portus$ docker exec -it 5035ec portusctl exec rake portus:info
[schema] Selected the schema for mysql
[Mailer config] Host:     <snip>
[Mailer config] Protocol: https://
Evaluated configuration:
---
email:
  from: <snip>
  name: <snip>
  reply_to: ''
  smtp:
    enabled: false
    address: <snip>
    port: 587
    domain: <snip>
    ssl_tls: ''
    enable_starttls_auto: false
    openssl_verify_mode: none
    ca_path: ''
    ca_file: ''
    user_name: ''
    password: "****"
    authentication: login
gravatar:
  enabled: true
delete:
  enabled: true
  contributors: false
  garbage_collector:
    enabled: false
    older_than: 30
    keep_latest: 5
    tag: ''
ldap:
  enabled: false
  hostname: ldap_hostname
  port: 389
  timeout: 5
  encryption:
    method: ''
    options:
      ca_file: ''
      ssl_version: TLSv1_2
  base: ''
  admin_base: ''
  group_base: ''
  filter: ''
  uid: uid
  authentication:
    enabled: false
    bind_dn: ''
    password: "****"
  group_sync:
    enabled: true
    default_role: viewer
  guess_email:
    enabled: false
    attr: ''
oauth:
  local_login:
    enabled: true
  google_oauth2:
    enabled: false
    id: ''
    secret: ''
    domain: ''
    options:
      hd: ''
  open_id:
    enabled: false
    identifier: ''
    domain: ''
  openid_connect:
    enabled: false
    issuer: ''
    identifier: ''
    secret: ''
  github:
    enabled: false
    client_id: ''
    client_secret: ''
    organization: ''
    team: ''
    domain: ''
  gitlab:
    enabled: false
    application_id: ''
    secret: ''
    group: ''
    domain: ''
    server: ''
  bitbucket:
    enabled: false
    key: ''
    secret: ''
    domain: ''
    options:
      team: ''
first_user_admin:
  enabled: true
signup:
  enabled: true
check_ssl_usage:
  enabled: true
registry:
  jwt_expiration_time:
    value: 15
  catalog_page:
    value: 100
  timeout:
    value: 2
  read_timeout:
    value: 120
machine_fqdn:
  value: <snip>
display_name:
  enabled: false
user_permission:
  change_visibility:
    enabled: true
  create_team:
    enabled: true
  manage_team:
    enabled: true
  create_namespace:
    enabled: true
  manage_namespace:
    enabled: true
  create_webhook:
    enabled: true
  manage_webhook:
    enabled: true
  push_images:
    policy: allow-teams
security:
  clair:
    server: ''
    health_port: 6061
    timeout: 900
  zypper:
    server: ''
  dummy:
    server: ''
anonymous_browsing:
  enabled: true
background:
  registry:
    enabled: true
  sync:
    enabled: true
    strategy: initial
pagination:
  per_page: 10
  before_after: 2

Portus version: 2.5.0-dev@a1b9f2ebfeb84680a9dcd5629195e4c52815735c

It's odd that github would say it's disabled, as we're using github authentication to log into Portus.

kinland commented 4 years ago

Additional observation: upon restarting Portus, there are now five docker images running (as opposed to 4). There are two containers running opensuse/portus:head with /init command:

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                                      NAMES
b7bb7557bcf4        nginx:alpine           "nginx -g 'daemon of…"   6 weeks ago         Up 3 weeks          0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   portus_nginx_1
5035ec47f25b        opensuse/portus:head   "/init"                  6 weeks ago         Up 3 weeks          3000/tcp                                   portus_background_1
d9d7f7f8ca0f        registry:2.6           "/entrypoint.sh /bin…"   6 weeks ago         Up 3 weeks          0.0.0.0:5000-5001->5000-5001/tcp           portus_registry_1
337b399fcd28        mariadb:10.0.23        "/docker-entrypoint.…"   6 weeks ago         Up 3 weeks          3306/tcp                                   portus_db_1

vs

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                                      NAMES
b7bb7557bcf4        nginx:alpine           "nginx -g 'daemon of…"   6 weeks ago         Up 17 seconds       0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   portus_nginx_1
5035ec47f25b        opensuse/portus:head   "/init"                  6 weeks ago         Up 15 seconds       3000/tcp                                   portus_background_1
d9d7f7f8ca0f        registry:2.6           "/entrypoint.sh /bin…"   6 weeks ago         Up 6 seconds        0.0.0.0:5000-5001->5000-5001/tcp           portus_registry_1
e93a32f5e405        opensuse/portus:head   "/init"                  6 weeks ago         Up 6 seconds        0.0.0.0:3000->3000/tcp                     portus_portus_1
337b399fcd28        mariadb:10.0.23        "/docker-entrypoint.…"   6 weeks ago         Up 15 seconds       3306/tcp                                   portus_db_1

I noticed this in the logs for that container that appeared upon restart. Maybe it's a memory leak? (Cannot allocate memory - fork(2))

Started POST "/v2/webhooks/events" for 172.21.0.6 at 2019-11-05 22:28:10 +0000
Processing by Api::V2::EventsController#create as JSON
Completed 202 Accepted in 7ms (ActiveRecord: 4.2ms)
[31] ! Terminating timed out worker: 174
Cannot allocate memory - fork(2)
[151] ! Detected parent died, dying
[152] ! Detected parent died, dying
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cluster.rb:139:in `fork'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cluster.rb:139:in `block in spawn_workers'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cluster.rb:135:in `times'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cluster.rb:135:in `spawn_workers'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cluster.rb:213:in `check_workers'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cluster.rb:486:in `run'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/launcher.rb:186:in `run'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/cli.rb:80:in `run'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/control_cli.rb:261:in `start'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/lib/puma/control_cli.rb:225:in `run'
/srv/Portus/vendor/bundle/ruby/2.6.0/gems/puma-3.12.1/bin/pumactl:8:in `<top (required)>'
/srv/Portus/vendor/bundle/ruby/2.6.0/bin/pumactl:23:in `load'
/srv/Portus/vendor/bundle/ruby/2.6.0/bin/pumactl:23:in `<main>'
[84] ! Detected parent died, dying
exit status 1
/usr/bin/bundle:23:in `load': cannot load such file -- /usr/lib64/ruby/gems/2.6.0/gems/bundler-1.16.4/exe/bundle (LoadError)
        from /usr/bin/bundle:23:in `<main>'

portus_portus_1 comes from the docker_compose.yml:

ubuntu@<ip-aws-ec2>:~/portus$ cat docker-compose.yml
version: "2"

services:
  portus:
    image: opensuse/portus:head
    environment:
      - PORTUS_MACHINE_FQDN_VALUE=${MACHINE_FQDN}

      # DB. The password for the database should definitely not be here. You are
      # probably better off with Docker Swarm secrets.
      - PORTUS_DB_HOST=db
      - PORTUS_DB_DATABASE=portus_production
      - PORTUS_DB_PASSWORD=${DATABASE_PASSWORD}
      - PORTUS_DB_POOL=5

      # Secrets. It can possibly be handled better with Swarm's secrets.
      - PORTUS_SECRET_KEY_BASE=${SECRET_KEY_BASE}
      - PORTUS_KEY_PATH=/certificates/portus.key
      - PORTUS_PASSWORD=${PORTUS_PASSWORD}

      # SSL
      - PORTUS_PUMA_TLS_KEY=/certificates/portus.key
      - PORTUS_PUMA_TLS_CERT=/certificates/portus.crt

      # NGinx is serving the assets instead of Puma. If you want to change this,
      # uncomment this line.
      #- RAILS_SERVE_STATIC_FILES='true'
    ports:
      - 3000:3000
    links:
      - db
    volumes:
      - ./secrets:/certificates:ro
      - static:/srv/Portus/public
      - ./config-github.yml:/srv/Portus/config/config-local.yml:ro

  background:
    image: opensuse/portus:head
    depends_on:
      - portus
      - db
    environment:
      # Theoretically not needed, but cconfig's been buggy on this...
      - CCONFIG_PREFIX=PORTUS
      - PORTUS_MACHINE_FQDN_VALUE=${MACHINE_FQDN}

      # DB. The password for the database should definitely not be here. You are
      # probably better off with Docker Swarm secrets.
      - PORTUS_DB_HOST=db
      - PORTUS_DB_DATABASE=portus_production
      - PORTUS_DB_PASSWORD=${DATABASE_PASSWORD}
      - PORTUS_DB_POOL=5

      # Secrets. It can possibly be handled better with Swarm's secrets.
      - PORTUS_SECRET_KEY_BASE=${SECRET_KEY_BASE}
      - PORTUS_KEY_PATH=/certificates/portus.key
      - PORTUS_PASSWORD=${PORTUS_PASSWORD}

      - PORTUS_BACKGROUND=true
    links:
      - db
    volumes:
      - ./secrets:/certificates:ro

  db:
    image: library/mariadb:10.0.23
    command: mysqld --character-set-server=utf8 --collation-server=utf8_unicode_ci --init-connect='SET NAMES UTF8;' --innodb-flush-log-at-trx-commit=0
    environment:
      - MYSQL_DATABASE=portus_production

      # Again, the password shouldn't be handled like this.
      - MYSQL_ROOT_PASSWORD=${DATABASE_PASSWORD}
    volumes:
      - ./portus/mariadb:/var/lib/mysql

  registry:
    image: library/registry:2.6
    command: ["/bin/sh", "/etc/docker/registry/init"]
    environment:
      # Authentication
      REGISTRY_AUTH_TOKEN_REALM: https://${MACHINE_FQDN}/v2/token
      REGISTRY_AUTH_TOKEN_SERVICE: ${MACHINE_FQDN}
      REGISTRY_AUTH_TOKEN_ISSUER: ${MACHINE_FQDN}
      REGISTRY_AUTH_TOKEN_ROOTCERTBUNDLE: /secrets/portus.crt

      # SSL
      REGISTRY_HTTP_TLS_CERTIFICATE: /secrets/portus.crt
      REGISTRY_HTTP_TLS_KEY: /secrets/portus.key

      # Portus endpoint
      REGISTRY_NOTIFICATIONS_ENDPOINTS: >
        - name: portus
          url: https://${MACHINE_FQDN}/v2/webhooks/events
          timeout: 2000ms
          threshold: 5
          backoff: 1s
    volumes:
      - ./portus/registry:/var/lib/registry
      - ./secrets:/secrets:ro
      - ./registry/config.yml:/etc/docker/registry/config.yml:ro
      - ./registry/init:/etc/docker/registry/init:ro
    ports:
      - 5000:5000
      - 5001:5001 # required to access debug service
    links:
      - portus:portus

  nginx:
    image: library/nginx:alpine
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./secrets:/secrets:ro
      - static:/srv/Portus/public:ro
    ports:
      - 80:80
      - 443:443
    links:
      - registry:registry
      - portus:portus

volumes:
  static:
    driver: local
Jean-Baptiste-Lasselle commented 4 years ago

Hi @dhorkin I read in details all your infos, and :

docker-compose up -d --force-recreate
# don't add the [--build] option
# and before you force recreate, you have to check first that
# you've got enough RAM available on
# the containerization host, where you execute 
# docker commands
stale[bot] commented 4 years ago

Thanks for all your contributions! This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.