Open dkulchinsky opened 2 years ago
this is potentially related to https://github.com/goharbor/harbor/issues/12948 which we already hit in the past, but looks like there's no progress there, so hoping there's some new insight.
The performance issue was happended at the distribution side to lookup and remove tags, we will do some investigation on this, but no specific plan so far.
The performance issue was happended at the distribution side to lookup and remove tags, we will do some investigation on this, but no specific plan so far.
Thanks @wy65701436, is there a workaround? I was thinking to delete these artifacts from artifact_trash
table so GC won't pick them up? I realize that we will end with these manifests and blobs in GCS, but given the situation it looks like it's better than the alternative where GC is broken completely.
I realize this is sort of a corner case, but I can imagine that others may end up in a similar situation, so hopefully you folks can find some cycles soon to take a look at fixing this šš¼
@wy65701436 on another instance of Harbor we run, we noticed that a repository with ~4000 tags takes about 2 minutes to delete a single manifest.
It seems to me like there's a significant performance issue during GC for repositories that have several thousand tags and more.
for example:
2021-10-20T11:10:52Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:261]: delete the manifest with registry v2 API: <project>/<repo>, application/vnd.docker.distribution.manifest.v2+json, sha256:fe582557fdb5eb00ca114e263784be44661f35ba1f7f15c764f0f43567a69939
2021-10-20T11:12:36Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:273]: delete manifest from storage: sha256:fe582557fdb5eb00ca114e263784be44661f35ba1f7f15c764f0f43567a69939
2021-10-20T11:12:37Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:261]: delete the manifest with registry v2 API: <project>/<repo>, application/vnd.docker.distribution.manifest.v2+json, sha256:c7faa9c6517dd640432b9172b832284b19a10324cde9782c1f16a793d8a9d041
2021-10-20T11:14:20Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:273]: delete manifest from storage: sha256:c7faa9c6517dd640432b9172b832284b19a10324cde9782c1f16a793d8a9d041
it also appears that these operations are sequential, perhaps some form of parallelism can be introduced to speed this up? though I think the root constraint needs to be addressed to be able to support any significant deployments.
An option to resolve this problem: the artifact in the distribution is untagged, the tag is managed in the harbor core side.
since in v2.5, we introduce the skip for deletion failure. This could workaround for the timeout.
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
still relevant
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
still an issue
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
definitely not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
I am getting same error with 2.10.2.
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
not stale.
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
not stale
Expected behavior and actual behavior: We deleted over 20,000 tags from a repository (these tags are auto generated during our periodic CI job to test the registry and CI), we expected to run GC to get the related blobs and manifests cleaned up, but now GC fails consistently with the following error:
2021-10-18T12:59:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:236]: 46566 blobs and 23266 manifests eligible for deletion 2021-10-18T12:59:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:237]: The GC could free up 4094 MB space, the size is a rough estimation. 2021-10-18T12:59:19Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:261]: delete the manifest with registry v2 API: <project>/<repo>/demo-go-app, application/vnd.docker.distribution.manifest.v2+json, sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8 2021-10-18T13:22:10Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:264]: failed to delete manifest with v2 API, <project>/<repo>/demo-go-app, sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8, retry timeout: http status code: 500, body: {"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{}}]} 2021-10-18T13:22:10Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:166]: failed to execute GC job at sweep phase, error: failed to delete manifest with v2 API: <project>/<repo>/demo-go-app, sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8: retry timeout: http status code: 500, body: {"errors":[{"code":"UNKNOWN","message":"unknown error","detail":{}}]}
we caught the following log in the registry.
DELETE
request:time="2021-10-18T12:59:19.582953425Z" level=info msg="authorized request" go.version=go1.15.12 http.request.host="harbor-registry:5000" http.request.id=3c90b874-a989-41c1-b60f-e8c72c447002 http.request.method=DELETE http.request.remoteaddr="127.0.0.1:38784" http.request.uri="/v2/<project>/<repo>/demo-go-app/manifests/sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8" http.request.useragent=harbor-registry-client vars.name="<project>/<repo>/demo-go-app" vars.reference="sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8"
and the
500
error ~20 minutes later:time="2021-10-18T13:22:10.756233562Z" level=error msg="response completed with error" auth.user.name="harbor_registry_user" err.code=unknown err.message="invalid checksum digest format" go.version=go1.15.12 http.request.host="harbor-registry:5000" http.request.id=3c90b874-a989-41c1-b60f-e8c72c447002 http.request.method=DELETE http.request.remoteaddr="127.0.0.1:38784" http.request.uri="/v2/<project>/<repo>/demo-go-app/manifests/sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8" http.request.useragent=harbor-registry-client http.response.contenttype="application/json; charset=utf-8" http.response.duration=22m51.241153516s http.response.status=500 http.response.written=70 vars.name="<project>/<repo>/demo-go-app" vars.reference="sha256:b86d808fd22197eb01f4aeecff490a5a7c50c06db7828afb00f7fc06d40172a8"
Steps to reproduce the problem:
- Use GCS for registry backend storage
- generate many (thousands?) of tags for the same repository
- delete all/most tags (we use a retention policy)
- run GC and observe the above errors
Versions: Please specify the versions of following systems.
- harbor version: v2.3.3
- docker engine version: N/A
- docker-compose version: N/A
Additional context:
- We use GCS for registry backend storage
Expected behavior and actual behavior: We deleted over 20,000 tags from a repository (these tags are auto generated during our periodic CI job to test the registry and CI), we expected to run GC to get the related blobs and manifests cleaned up, but now GC fails consistently with the following error:
we caught the following log in the registry.
DELETE
request:and the
500
error ~20 minutes later:Steps to reproduce the problem:
Versions: Please specify the versions of following systems.
Additional context: