goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.17k stars 4.76k forks source link

HA harbor with two registry pods/two core pods, push image failure sometimes with "blob upload invalid" error #17504

Closed danielzhanghl closed 2 years ago

danielzhanghl commented 2 years ago

hi Harbor team, in my set up(2.5.0 or 2.6.0 is the same, on k8s platform), there are two registry pods(mounted by the same glusterfs RWX volume) and two core pods, sometimes got failure like "blob upload invalid" when push one image, after retry sometimes, the image could be pushed. error like:

time="2022-09-05T21:14:56Z" level=fatal msg="writing blob: uploading layer to https://1.2.3.4:30003/v2/library/centos/blobs/uploads/**d5eb04c8-58eb-47f4-a79a-1818164b2c31**?_state=rQ_Qlg8wPQNllmnlVgAhE7ciMfDOcjK5zQSwrzCrcm97Ik5hbWUiOiJsaWJyYXJ5L2NlbnRvcyIsIlVVSUQiOiJkNWViMDRjOC01OGViLTQ3ZjQtYTc5YS0xODE4MTY0YjJjMzEiLCJPZmZzZXQiOjI5MSwiU3RhcnRlZEF0IjoiMjAyMi0wOS0wNVQyMToxNDo1NloifQ%3D%3D&digest=sha256%3A54f1c806bb294489fb75dd463096b31fb26daf1ee36b6cd17ce390693bfb4763: blob upload invalid" command terminated with exit code 1

from the registry pod log, seems when push the blobs, the request will go to both registry pods, and after blobs are uploaded, when manifest is uploaded, registry pod can't find some blobs even from the log that is uploaded already, for example, blob1 -----> registry pod1, blob2 ----->registry pod 2, manifest ----> registry pod2.

is there possible that the blob1 is not get be appeared on registry pod 2 when upload manifest?

and I tried to set the registry SVC DNS sessionAffinity: ClientIP, but it does not work,

because I found the remote ip is not the same when upload blob and manifest,
could you help to comment on what's the possible reason cause the remote IP is different for upload blob and manifest?

upload log that related to blob id "d5eb04c8-58eb-47f4-a79a-1818164b2c31" , and there are two remoteaddr (1.2.3.4 and 172.30.254.27) but it's in one image push command. debug.zip

thanks for any comment!

danielzhanghl commented 2 years ago

similar to the issue of https://docs.openshift.com/container-platform/3.3/install_config/registry/registry_known_issues.html#:~:text=blob%20upload%20invalid%20These%20errors%20are%20returned%20by,in%20the%20synchronization%20of%20file%20attributes%20across%20nodes.

and don't know why the remote ip is not same in one push command.

danielzhanghl commented 2 years ago

tuned glusterfs volume as below, hopeful these are useful.

    # gluster volume set  ... group db-workload
    volume set: success
    # gluster volume set  ...  performance.write-behind-trickling-writes off
    volume set: success
    # gluster volume set  ..  performance.flush-behind off
    volume set: success
danielzhanghl commented 2 years ago

after applied the parameters, the issue is gone, seems that helps.