goharbor / harbor-helm

The helm chart to deploy Harbor
Apache License 2.0
1.2k stars 759 forks source link

Trivy Scanner Client.Timeout exceeded while awaiting header #1860

Open qcserestipy opened 1 week ago

qcserestipy commented 1 week ago

Dear all,

we are using the harbor helm chart in our production environments for already a long time. Recently, we started experiencing connection timeouts between the harbor-core component and the trivy-scanner-adapter:

image

The trivy scanner is shown as unhealthy even though the deployment itself shows no issues. In the container logs for the core part we can see those errors:

/harbor/src/server/middleware/middleware.go:57, github.com/goharbor/harbor/src/core/middlewares.MiddleWares.Middleware.New.func14.1
2024-11-18T15:44:08Z [ERROR] [/controller/scanner/base_controller.go:322][error="v1 client: get metadata: Get "https://argo-harbor-trivy:8443/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" requestID="563d23e433280f7c865e6b9baf8353e2"]: failed to ping scanner
2024-11-18T15:44:08Z [ERROR] [/jobservice/logger/service.go:99]: Get registration error: scanner controller: ping: v1 client: get metadata: Get "https://argo-harbor-trivy:8443/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-11-18T15:44:08Z [ERROR] [/controller/scanner/base_controller.go:322][error="v1 client: get metadata: Get "https://argo-harbor-trivy:8443/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" requestID="1b4acc0bf9a1bafb8368f0fe8dbba52b"]: failed to ping scanner
2024-11-18T15:44:08Z [ERROR] [/lib/http/error.go:57]: {"errors":[{"code":"UNKNOWN","message":"scanner controller: ping: v1 client: get metadata: Get \"https://argo-harbor-trivy:8443/api/v1/metadata\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}]}
/harbor/src/controller/scanner/base_controller.go:324, github.com/goharbor/harbor/src/controller/scanner.(*basicController).Ping

This time out behavior is not bound to any specific events and happens seemingly random.

Using ksniff we have already found that the trivy scanner sends RST requestes to the harbor-core server when this issue is visible. To rule out slow network issues we want to increase the connection timeout between both components.

Is there a way to configure the connection timeout threshold via the helm chart? Are there other measures we could take for more thoroughly testing this behavior?

Support would be greatly appreciated!