goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
23.13k stars 4.66k forks source link

harbor nginx not getting the right container IP after core and portal container restart #19384

Closed p2p-solutions closed 6 months ago

p2p-solutions commented 10 months ago

Steps to reproduce:

  1. Install harbor
  2. Restart core and portal container.
  3. Try to login to registry or try to upload/download image. we are using: docker login docker-registry.$(hostname -d):5000 -u admin -p $(cat /opt/mcspace/harbor/.registry) or "skopeo copy --sign-by infra@${domain} \ --sign-passphrase-file /opt/mcspace/gpg/passphrase \ --retry-times 3 --insecure-policy \ docker-archive:${image_file_to_load} \ docker://${docker_registry}/${project_name}/${image_name}"
  4. You will see an: Error: authenticating creds for "docker-registry.some_domain.tld:5000": pinging container registry docker-registry.some_domain.tld:5000: invalid status code from registry 502 (Bad Gateway)
  5. In nginx container log you will see: 2023/09/21 08:26:30 [error] 7#0: *2 connect() failed (113: No route to host) while connecting to upstream, client: 10.164.xx.yyy, server: , request: "GET /v2/ HTTP/1.1", upstream: "https://10.89.xx.80:8443/v2/", host: "docker-registry.some_domain.tld:5000"
  6. After portal and core containers are restarted they got a new internal IP addresses, but nginx caches addresses and does not re-resolve a new service IPs.
  7. That we tried is to make nginx skip the cache by using the following options: under server section: resolver 127.0.0.11 valid=1s; resolver_timeout 1s; and under location sections we tried to use(as example): set $core_c_var "https://core:8443/c/"; proxy_pass $core_c_var; Till now all our tries had no success. Can you give us a clue on how to solve this issue.

Thanks in advance!

zyyw commented 10 months ago

Hi @vladichorny, may we know which version of Harbor you are using? The best practice of your case might be restarting with sudo docker-compose down && sudo docker-compose up -d to shutdown and restart all containers together instead of restarting core/portal container independently.

p2p-solutions commented 9 months ago

Hi @vladichorny, may we know which version of Harbor you are using? The best practice of your case might be restarting with sudo docker-compose down && sudo docker-compose up -d to shutdown and restart all containers together instead of restarting core/portal container independently.

sure. harbor offline installer v2.8.2 but I beleive that this issue is present in other versions as well

p2p-solutions commented 9 months ago

Hi @vladichorny, may we know which version of Harbor you are using? The best practice of your case might be restarting with sudo docker-compose down && sudo docker-compose up -d to shutdown and restart all containers together instead of restarting core/portal container independently.

Its manual solution that can't be used on on long-lived solutions. no one will check each X time if harbor is inaccessible. Currently we did a workaound. We wrote a simple script(wraped as service) that checking if one of harbor containers up time is less then ngnix container up time then restart nginx container. But its a workaround until we are waiting for a proper solution from harbor side)))

p2p-solutions commented 9 months ago

some update?

liubin commented 9 months ago

It's an issue of nginx itself, it caches upstream's IP and doesn't refresh it at running. Some additional configs can fix it but it's a bit troublesome, you can refer to theses:

I had the same issue ever, but now we are running nginx and core in a 1-1 relation in one pod, so the upstream always will be 127.0.0.1

github-actions[bot] commented 7 months ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.