goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
23.74k stars 4.73k forks source link

Images exists in a proxy cache project but it do not show in project repositories list. #19609

Open qy527145 opened 10 months ago

qy527145 commented 10 months ago

I set the library project to the Proxy Cache type and pointed it to Docker Hub. Then I configured the docker client to pull the image from the harbor's library. I am sure that the image has been cached in the library project, but I cannot find it in the repositories list of the library project. bug

qy527145 commented 10 months ago

I noticed that if the Proxy Cache option is not checked when creating a project, the push image can still be seen.

stonezdj commented 10 months ago

Blobs are cached first, then manifest will be cached in about 15 mins, or it might fail because some blobs are missed (maybe already skipped to pull from proxy cache)

qy527145 commented 9 months ago

Is there a solution or a plan to fix it?

nicl-dev commented 9 months ago

We are facing the same problem. We want to use Harbor to cache and scan all images running in our clusters. However, that is not possible with unreliable behavior. Right now there are probably ~20 images not showing up in our proxy caches at all (We do have 4). We even have a cache that shows a used Quota of 0MB even though the image was 100% pulled through the proxy cache (tested by trying to pull with a misconfigured proxy cache that threw an error until we fixed it). This can't be working as intended, right?

CleanShot 2023-12-20 at 12 13 18@2x

CleanShot 2023-12-20 at 12 23 25@2x

MaJaHa95 commented 9 months ago

I've been facing the same issue. Turning on debug logs brought me an error:

error the artifact is not ready yet, failed to tag it to v0.26.0

And that, in turn, brought me to a few issues that seem to share symptoms:

  1. https://github.com/goharbor/harbor/issues/14791
  2. https://github.com/goharbor/harbor/issues/18173
  3. https://github.com/goharbor/harbor/issues/18335 (sorta related)

The first ends with a comment by @stonezdj:

Summary of this issue: the image tag is not guaranteed to be cached in Harbor, because it depends on the docker client's behavior, if the tag/digest information is cached local, the docker client just sends a pull by digest request to the server, Harbor could not cache this image by tag, but cache it with digest.

It's not entirely clear to me what this means, though. It's possible the situations are actually just fundamentally different, but which "local docker client" are we referring to?

I'm migrating pods that were previously using Docker Hub (and such) images directly, so it's true that the docker client on my Kubernetes nodes would have those images cached. And I can understand--sorta? maybe?--how that would mess up the way Harbor retrieves the images. But I don't know why Harbor (running in a pod) would be reading the Kubernetes node cache, and even if it were, I'd still expect to see something in the UI, even if it weren't perfect end-to-end.

Then there's this other comment, which is saying the same thing, but I'm still not sure how to apply it to this situation:

Because harbor proxy rely on HEAD request to link the digest and the tag.

If the tag and its digest information is cached locally, then it won't send the HEAD request to the server, then the Harbor doesn't link the tag and the digest.

But yeah, all in all, I'm having the exact same symptoms as nicl-dev above: pulls work great, but they don't show in the UI, and the project quota usage is still 0 bytes, and scans don't run.

One thing I'll add, though, is that if I do a docker pull harbor.example.com/proxy-project/image:latest, even right after a Kubernetes pod showed these symptoms, everything works beautifully. Someone in one of those issues mentioned the same thing. I've had Kubernetes work exactly once (out of maybe ten images I've tried), and that was in a pod where other images failed. I haven't once experienced the issue with docker pull.

So it must be something about how Kubernetes is making the call, and it must somehow differ from docker pull. I'm not sure about their respective internals to be able to speak to that, though, and it's unclear to me how that would even have an effect on how Harbor handles the requests.

nicl-dev commented 9 months ago

Regarding any local caching: We suspected that it might be a caching issue and actively tried pulling an Image that wasn't used by anything in our cluster before and it still did not get picked up by the Harbor UI.

narutolied commented 8 months ago

I had the same problem and created a second deployment with the same images and tags. Suddenly they appear in the registry as cached and can be scanned. The image also appears if the image is updated and pulled again from the same deployment. Maybe its related to how aws nodes pull that images if it is already used and the image pull policy "always" is not a clean way to repull the same images.

nicl-dev commented 7 months ago

Small addition to @narutolied's comment: I tried dozens of times to pull new images from new deployments that have never touched our EKS Nodes on a fresh harbor instance and they do not show up consistently. Most of the time they do not show up at all. It does, however, show a quota used. Example:

CleanShot 2024-02-14 at 17 15 35@2x

CleanShot 2024-02-14 at 17 16 50@2x

CleanShot 2024-02-14 at 17 17 48@2x

Can anyone review PR #19910 please? It would be awesome to have a fix and reliable scanning because right now harbor is basically useless for our purpose.

github-actions[bot] commented 5 months ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

nicl-dev commented 5 months ago

This issue is still very relevant.

riyas-rawther commented 3 months ago

still relevant - I am using Chart 1.14.0

remkolems commented 3 months ago

On docker compose it is also very relevant (v2.11.0) with a minio S3 backend. Tags are also automatically removed when images are proxy pulled.

See Garbage Collection (GC) option on untagged artifacts... default active.

image

jseiser commented 2 months ago

Still and issue on 1.15.0

DougTea commented 2 months ago

I use harbor to cache docker hub ,but i still meet docker hub rate limit.Some images are never cached even pulled many many times.

nuved commented 4 days ago

Hi all , I've faced the same issue , docker pull does work properly while using k8s with containerd and kubelet ( 1.27.16 ) , harbor could not cache and store the images at all although , kubelet still could get the images and I'm just wondering how does it possible to get the image without having any log in harbor side! that means harbor just work as proxy without working as cache?