goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
Apache License 2.0
23.13k stars 4.66k forks source link

harbor nginx not getting the right container IP after core and portal container restart #19384

Closed p2p-solutions closed 6 months ago

p2p-solutions commented 10 months ago

Steps to reproduce:

  1. Install harbor
  2. Restart core and portal container.
  3. Try to login to registry or try to upload/download image. we are using: docker login docker-registry.$(hostname -d):5000 -u admin -p $(cat /opt/mcspace/harbor/.registry) or "skopeo copy --sign-by infra@${domain} \ --sign-passphrase-file /opt/mcspace/gpg/passphrase \ --retry-times 3 --insecure-policy \ docker-archive:${image_file_to_load} \ docker://${docker_registry}/${project_name}/${image_name}"
  4. You will see an: Error: authenticating creds for "docker-registry.some_domain.tld:5000": pinging container registry docker-registry.some_domain.tld:5000: invalid status code from registry 502 (Bad Gateway)
  5. In nginx container log you will see: 2023/09/21 08:26:30 [error] 7#0: *2 connect() failed (113: No route to host) while connecting to upstream, client: 10.164.xx.yyy, server: , request: "GET /v2/ HTTP/1.1", upstream: "https://10.89.xx.80:8443/v2/", host: "docker-registry.some_domain.tld:5000"
  6. After portal and core containers are restarted they got a new internal IP addresses, but nginx caches addresses and does not re-resolve a new service IPs.
  7. That we tried is to make nginx skip the cache by using the following options: under server section: resolver valid=1s; resolver_timeout 1s; and under location sections we tried to use(as example): set $core_c_var "https://core:8443/c/"; proxy_pass $core_c_var; Till now all our tries had no success. Can you give us a clue on how to solve this issue.

Thanks in advance!

zyyw commented 10 months ago

Hi @vladichorny, may we know which version of Harbor you are using? The best practice of your case might be restarting with sudo docker-compose down && sudo docker-compose up -d to shutdown and restart all containers together instead of restarting core/portal container independently.

p2p-solutions commented 9 months ago

Hi @vladichorny, may we know which version of Harbor you are using? The best practice of your case might be restarting with sudo docker-compose down && sudo docker-compose up -d to shutdown and restart all containers together instead of restarting core/portal container independently.

sure. harbor offline installer v2.8.2 but I beleive that this issue is present in other versions as well

p2p-solutions commented 9 months ago

Hi @vladichorny, may we know which version of Harbor you are using? The best practice of your case might be restarting with sudo docker-compose down && sudo docker-compose up -d to shutdown and restart all containers together instead of restarting core/portal container independently.

Its manual solution that can't be used on on long-lived solutions. no one will check each X time if harbor is inaccessible. Currently we did a workaound. We wrote a simple script(wraped as service) that checking if one of harbor containers up time is less then ngnix container up time then restart nginx container. But its a workaround until we are waiting for a proper solution from harbor side)))

p2p-solutions commented 9 months ago

some update?

liubin commented 9 months ago

It's an issue of nginx itself, it caches upstream's IP and doesn't refresh it at running. Some additional configs can fix it but it's a bit troublesome, you can refer to theses:

I had the same issue ever, but now we are running nginx and core in a 1-1 relation in one pod, so the upstream always will be

github-actions[bot] commented 7 months ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.