goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.29k stars 4.77k forks source link

Harbor Proxy should serve manifests from local repository if the remote manifest digest matches a local manifest #21122

Open raphaelzoellner opened 1 month ago

raphaelzoellner commented 1 month ago

Is your feature request related to a problem? Please describe. DockerHub Proxy GET /manifest requests referencing a tag are not served from cache even when the remote manifest matches a manifest in the local repository.

This leads to requests reducing the users remainder of the rate limit.

This can be observed by running harbor with log level debug and fetching a manifest multiple times.

GET https://harbor.example.com/v2/docker-hub/goharbor/harbor-exporter/manifests/v2.10.2

[DEBUG] [/controller/proxy/controller.go:196]: Digest is not found in manifest list cache, key=cache:manifestlist:docker-hub/goharbor/harbor-exporter:sha256:b6bb051a967de0948992f4a44ed00369adcc04ba88cdc76ed8ddeb4326ccf8be
[DEBUG] [/server/middleware/repoproxy/proxy.go:203]: the tag is v2.10.2, digest is 
[WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo

https://github.com/goharbor/harbor/blob/v2.10.2/src/controller/proxy/controller.go#L193-L201C1 https://github.com/goharbor/harbor/blob/v2.10.2/src/server/middleware/repoproxy/proxy.go#L203-L209

Describe the solution you'd like Harbor should serve the manifest from the local repository if the remote manifest digest matches the digest of the manifest in the local repository.

Describe the main design/architecture of your solution Since Harbor already attempts to use a cache for ManifestLists I suggest to extend this behavior to Manifests by attempting to pull the manifest from the local repository.

https://github.com/goharbor/harbor/blob/v2.10.2/src/controller/proxy/controller.go#L159

Update: Since middlewares would be skipped by the extended behavior described above, I suggest to push the manifest to the local repository with reference to the digest and tag if both are known instead.

https://github.com/goharbor/harbor/blob/a0d27d32cc78234a888ed0c73ae3870e7b62ddd4/src/controller/proxy/manifestcache.go#L201

stonezdj commented 3 weeks ago

Could you please double check the artifact with the same tag and same digest exist in the proxy cache project? It seems that the artifact didn't cached yet from the log.

There are two caches, one is the registry, another is the redis cache, the following log is refer the redis cache. [/controller/proxy/controller.go:196]: Digest is not found in manifest list cache, key=cache:manifestlist:docker-hub/goharbor/harbor-exporter:sha256:b6bb051a967de0948992f4a44ed00369adcc04ba88cdc76ed8ddeb4326ccf8be

raphaelzoellner commented 3 weeks ago

Sure, I've reproduced the behavior.

I've performed multiple GET manifest requests referencing the tag. https://harbor.example.com/v2/docker-hub/goharbor/harbor-exporter/manifests/v2.10.2

2024-11-05T09:26:28Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo
2024-11-05T09:26:52Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo
2024-11-05T09:27:47Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo
2024-11-05T09:31:22Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo

The corresponding artifact in the Harbor portal does not seem to have a tag associated with it, but was already created before the last GET request. grafik

On the other hand when performing a GET manifest request referencing the digest, logs of the form is not found in proxy cache, fetch it from remote repo seem not to be created, I assume the manifest is served from the local repository in this case. https://harbor.example.com/v2/docker-hub/goharbor/harbor-exporter/manifests/sha256:b6bb051a967de0948992f4a44ed00369adcc04ba88cdc76ed8ddeb4326ccf8be

stonezdj commented 3 weeks ago

So the problem is the tag is not created, not proxied image is not served?

raphaelzoellner commented 3 weeks ago

Hello yes, the images are currently proxied in this case, but not served from the local repository cache even though the manifest and layers exist in the local repository already.

The cause for this behavior seems to be that the manifest was initially only pushed with a reference to the digest, but not the tag.

stonezdj commented 2 weeks ago

The root cause is the tag is not created in proxy cache project, and the request is pull image by tag, then it cause the image is not served with cached content. is that correct?

raphaelzoellner commented 2 weeks ago

Yes I think so, even if the tag exists in the proxy cache project, the digest should be compared (which should already be the case) since tags might be moved in the upstream registry to a new manifest referencing other layers. https://github.com/goharbor/harbor/blob/1f75b7aaef21a1d87da8dbbff20022f58d82f9d4/src/controller/proxy/controller.go#L194C1-L194C63

I've created the following PR to push not only the manifest referenced by digest but also push the manifest referenced by tag if it is known, which aims to solve this. https://github.com/goharbor/harbor/pull/21141

@stonezdj was this PR intentionally closed? https://github.com/goharbor/harbor/pull/21141#issuecomment-2467649068

raphaelzoellner commented 1 week ago

Hello @stonezdj,

why was #21141 closed? Could this have been a mixup with #21123, which was already closed?

If this was intentionally closed, could you describe the concerns, so one can think about another solution to the problem?

stonezdj commented 3 days ago

see comments: https://github.com/goharbor/harbor/pull/21141/files#r1860080983