goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.17k stars 4.76k forks source link

Issue during push "client disconnected during blob PATCH" #20472

Closed gwagner681105 closed 3 weeks ago

gwagner681105 commented 5 months ago

Harbor Version: 2.9 Kubernetes Installation

We have a repo, where the upload of a particular image is not possible anymore. It could be, that previous push ran into quota and did possibly left some leftovers in the repo storage folder

{"stream":"stderr","logtag":"F","message":"time=\"2024-05-22T04:25:34.394398505Z\" level=error msg=\"client disconnected during blob PATCH\" auth.user.name=\"harbor_registry_user\" contentLength=-1 copied=94383255 error=\"unexpected EOF\" go.version=go1.20.7 http.request.host=harbor.abraxas-tools.ch http.request.id=e2f33e77-52b4-40c3-b7d0-bc6c014bd2c0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/data/products/airflow/cache/blobs/uploads/01084cd7-ab53-418a-8d52-7e30a529c755?_state=ZOpaG69hMQ-FKDFlASFea5KG2PWDMsFLq6DyXqS2vGJ7Ik5hbWUiOiJkYXRhL3Byb2R1Y3RzL2FpcmZsb3cvY2FjaGUiLCJVVUlEIjoiMDEwODRjZDctYWI1My00MThhLThkNTItN2UzMGE1MjljNzU1IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA1LTIyVDA0OjI1OjI0LjMzMjQ2MDYyNVoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.3.0\" vars.name=\"data/products/airflow/cache\" vars.uuid=01084cd7-ab53-418a-8d52-7e30a529c755 "}

I only see the cache folder in the GUI. --> [data/products/airflow/cache] We tried to delete this path from GUI, but the upload failed again.

I have seen that, even if I deleted the path data/products/airflow/cache in the GUI, we still have the data on the storage /docker/registry/v2/repositories/data/products/airflow

Is there a way to force the upload or to cleanup the leftovers in order to get the upload working again?

gwagner681105 commented 5 months ago

I forgot to mention, the client behavior was like the following:

All except 1 layers have been uploaded successfully, only one layer is failing and the docker client ended with the error message "reset by peer" or "use of closed network connection"

In the log of the core pot (kubernetes pot) the error "http: proxy error: context canceled" appears

MinerYang commented 5 months ago

Hi @gwagner681105 , I am curious how you get into this issue, could you share the exact running cmd and more about the registry log?

gwagner681105 commented 5 months ago

Hi @MinerYang I am a bit further. I found out, that our WAF, which is in between possibly causes the issue. At least one of the failing use cases is working when bypassing the WAF System. I am currently waiting for the confirmation regaring the second failing use case

stonezdj commented 5 months ago

Maybe your WAF system blocks the PATCH method?

gwagner681105 commented 5 months ago

I am currently analysing the issue on our WAF. I will post the result here

stonezdj commented 5 months ago

Can you please post the full log of the registry? please grep with PATCH method's return code in the registry.log

Kajot-dev commented 5 months ago

I had a similar issue once. May I ask what kind of storage are you exactly using? If its S3, what kind of it exactly? For me it was Ceph S3.

gwagner681105 commented 5 months ago

We are using NFS.

File with Patch Operation: Explore-logs-2024-06-04 09_04_19.json

The faulty occurence has been decreased over the time. We only had one occurence in the last 48 hours: Explore-logs-2024-06-04 09_08_32.json

I will have a look at the issue this week (WAF Team is always very busy) ;-)

gwagner681105 commented 5 months ago

Finally, we had a look with our WAF Engineer at the Issue. The don't see any anomalies on the WAF. We only see an PATCH Command http return code 242 followed by 499.

10.73.64.85 - - [05/Jun/2024:13:13:49 +0200] "PATCH /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D HTTP/1.0" 202 0 "-" "kaniko/v1.15.0" harbor.abraxas-tools.ch 242 - TLSv1.3;TLS13-AES256-GCM-SHA384;256 /abraxas/vs-harbor.abraxas-tools.ch_internal

10.73.64.85 - - [05/Jun/2024:13:13:52 +0200] "PATCH /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/a7d87790-b482-4278-9c67-8a7510ffd69c?_state=iJ-rRrik9UGoxjeiKbpYKXgfVRJqbgScTJhbyDHehaR7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiYTdkODc3OTAtYjQ4Mi00Mjc4LTljNjctOGE3NTEwZmZkNjljIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ4Ljg0MDA4MTEyNFoifQ%3D%3D HTTP/1.0" 499 21 "-" "kaniko/v1.15.0" harbor.abraxas-tools.ch 3904 - TLSv1.3;TLS13-AES256-GCM-SHA384;256 /abraxas/vs-harbor.abraxas-tools.ch_internal

Corresponding Harbor log:

2024-06-05 13:13:52.891 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:52.890874824Z\" level=error msg=\"client disconnected during blob PATCH\" auth.user.name=\"harbor_registry_user\" contentLength=-1 copied=140676763 error=\"unexpected EOF\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=17a4c5be-02ac-4410-b0c4-e720da983ba0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/a7d87790-b482-4278-9c67-8a7510ffd69c?_state=iJ-rRrik9UGoxjeiKbpYKXgfVRJqbgScTJhbyDHehaR7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiYTdkODc3OTAtYjQ4Mi00Mjc4LTljNjctOGE3NTEwZmZkNjljIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ4Ljg0MDA4MTEyNFoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.15.0\" vars.name=\"ste_publi/deklaration/webapps/declaration-webapp/cache\" vars.uuid=a7d87790-b482-4278-9c67-8a7510ffd69c "} |   -- | -- | --   |   | 2024-06-05 13:13:50.210 | {"stream":"stdout","logtag":"F","message":"100.125.140.175 - - [05/Jun/2024:11:13:49 +0000] \"PUT /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=Stt57n4r7u8lEWOkWOH5EgGOMPe4E8ycGOIhW-8rcpJ7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjozMjcsIlN0YXJ0ZWRBdCI6IjIwMjQtMDYtMDVUMTE6MTM6NDlaIn0%3D&digest=sha256%3A32b2336bada1e1f24b9f7c45a989b10e5882886fd4aac43a357f5027ff5d7290 HTTP/1.1\" 201 0 \"\" \"kaniko/v1.15.0\""} |     |   | 2024-06-05 13:13:50.210 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:50.210557719Z\" level=info msg=\"response completed\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=2e77e46b-16a2-4153-9cc5-974a9d0782be http.request.method=PUT http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=Stt57n4r7u8lEWOkWOH5EgGOMPe4E8ycGOIhW-8rcpJ7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjozMjcsIlN0YXJ0ZWRBdCI6IjIwMjQtMDYtMDVUMTE6MTM6NDlaIn0%3D&digest=sha256%3A32b2336bada1e1f24b9f7c45a989b10e5882886fd4aac43a357f5027ff5d7290\" http.request.useragent=\"kaniko/v1.15.0\" http.response.duration=297.081707ms http.response.status=201 http.response.written=0 "} |     |   | 2024-06-05 13:13:50.015 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:50.014896031Z\" level=info msg=\"authorized request\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=2e77e46b-16a2-4153-9cc5-974a9d0782be http.request.method=PUT http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=Stt57n4r7u8lEWOkWOH5EgGOMPe4E8ycGOIhW-8rcpJ7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjozMjcsIlN0YXJ0ZWRBdCI6IjIwMjQtMDYtMDVUMTE6MTM6NDlaIn0%3D&digest=sha256%3A32b2336bada1e1f24b9f7c45a989b10e5882886fd4aac43a357f5027ff5d7290\" http.request.useragent=\"kaniko/v1.15.0\" vars.name=\"ste_publi/deklaration/webapps/declaration-webapp/cache\" vars.uuid=8ca49121-5b56-46d5-834f-2d2b112d30b9 "} |     |   | 2024-06-05 13:13:49.495 | {"stream":"stdout","logtag":"F","message":"100.125.140.175 - - [05/Jun/2024:11:13:49 +0000] \"PATCH /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D HTTP/1.1\" 202 0 \"\" \"kaniko/v1.15.0\""} |     |   | 2024-06-05 13:13:49.495 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:49.49528143Z\" level=info msg=\"response completed\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=066bfd53-4f53-46c3-9419-c0919a268ea0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.15.0\" http.response.duration=186.10932ms http.response.status=202 http.response.written=0 "} |     |   | 2024-06-05 13:13:49.443 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:49.443329388Z\" level=info msg=\"authorized request\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=066bfd53-4f53-46c3-9419-c0919a268ea0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.15.0\" vars.name=\"ste_publi/deklaration/webapps/declaration-webapp/cache\" vars.uuid=8ca49121-5b56-46d5-834f-2d2b112d30b9 "} |     |   | 2024-06-05 13:13:49.234 | {"stream":"stdout","logtag":"F","message":"100.125.140.175 - - [05/Jun/2024:11:13:48 +0000] \"POST /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/ HTTP/1.1\" 202 0 \"\" \"kaniko/v1.15.0\""} |     |   | 2024-06-05 13:13:49.234 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:49.234746598Z\" level=info msg=\"response completed\" go.version=go1.20.7 http.request.contenttype=\"application/json\" http.request.host=harbor.abraxas-tools.ch http.request.id=a9f6da61-4533-4613-a6c3-870ed8bde2b7 http.request.method=POST http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/\" http.request.useragent=\"kaniko/v1.15.0\" http.response.duration=290.573139ms http.response.status=202 http.response.written=0 "}
bqio commented 5 months ago

I have the same problem. Did you manage to find the reason?

I used Traefik/DockerSwarm/Registry:2

gwagner681105 commented 5 months ago

Our WAF Team did not find any anomalies on their site. We have a F5 WAF System. Funnywise the issue disappeared magically. The last occurence happened 5 days ago. In any cases we will update to 2.11 and keep an eye on it.

patzm commented 5 months ago

I am using traefik as the reverse proxy before a http Harbor stack (connected on the local network). I ran into this issue always after 60 seconds. Turns out, I needed increase the readTimeout:

entrypoints:
  websecure:
    address: ":443"
    transport:
      respondingTimeouts:
        readTimeout: 1800

also important in the harbor.yml:

    redirect:
      disable: true
Kajot-dev commented 5 months ago

I am using traefik as the reverse proxy before a http Harbor stack (connected on the local network). I ran into this issue always after 60 seconds. Turns out, I needed increase the readTimeout:

entrypoints:
  websecure:
    address: ":443"
    transport:
      respondingTimeouts:
        readTimeout: 1800

also important in the harbor.yml:

    redirect:
      disable: true

AFAIK redirectonly does apply to s3 while pulling the image so I don't think thats related

waipeng commented 4 months ago

Just want to leave a comment in case anyone runs into this like us. In our case, this was due to a firewall. It flagged python:3.10-bookworm as Virus/Linux.WGeneric.eizzgy.

We terminate SSL outside of the cluster. The FW intercepted HTTP traffic from LB to cluster, and sent a RST to harbor-core.

Hope this helps someone.

ldacey commented 3 months ago

Just want to leave a comment in case anyone runs into this like us. In our case, this was due to a firewall. It flagged python:3.10-bookworm as Virus/Linux.WGeneric.eizzgy.

We terminate SSL outside of the cluster. The FW intercepted HTTP traffic from LB to cluster, and sent a RST to harbor-core.

Hope this helps someone.

Thank you for leaving this comment. I was facing issues with pushing some images for a week but the networking team insisted there had been no changes and no traffic was being blocked. Only specific layers were failing due to blob PATCH and I could not get it to work no matter what.

Apparently, python:3.12-slim-bookworm was also being flagged as Virus/Linux.WGeneric.eizzgy. I provided your comment to the networking team and they were able to fix my issue.

zibellon commented 3 months ago

@patzm I LOVE YOU ! YOU ARE THE BEST !

I spent 2 days to fix the same issue. My infrastructure: Docker-swarm, Traefik (Gitlab), Gitlab (Docker), and more other containers.

I think - problem in gitlab nginx settings ...

THANK YOU :)))))


In traefik official docs: https://doc.traefik.io/traefik/v2.11/routing/entrypoints/#transport - no limits. Mistake in docs ?

I add next lines to my traefik config

  [entryPoints.websecure]
    address = ":443"
    # New lines from here
    [entryPoints.websecure.transport]
      [entryPoints.websecure.transport.respondingTimeouts]
        idleTimeout = 600
        writeTimeout = 600
        readTimeout = 600
github-actions[bot] commented 1 month ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.