goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.31k stars 4.77k forks source link

How to configure timout for image scan #16160

Closed ShabiDabi closed 2 years ago

ShabiDabi commented 2 years ago

Can someone please advise on a way to configure a longer timeout for image scanning? I have a big registry and sometimes I get context deadline exceeded when scanning images. Is it possible to configure this using scanner metadata?

EvgenyAGRO commented 2 years ago

I'm struggling with the same problem. @danielpacak Do you maybe have a suggestion on how to configure longer timeouts for harbor scanners?

wy65701436 commented 2 years ago

you can modify the SCANNER_TRIVY_TIMEOUT to enlarge it in the trivy-adapter env, @danielpacak please correct me.

EvgenyAGRO commented 2 years ago

@wy65701436 Should this work if I wrote a custom scanner and not using trivy? I mean can I just return SCANNER_TRIVY_TIMEOUT with some custom value in my scanner metadata and harbor will treat it as timeout for image scan?

wy65701436 commented 2 years ago

no, it's only for trivy. All the common behavior are designed in the scanner spec.

EvgenyAGRO commented 2 years ago

@wy65701436 I really couldn’t find a clear documentation on that matter, can you please provide a url to documentation of possible settings for a custom scanner?

ShabiDabi commented 2 years ago

@wy65701436 @danielpacak, I'm using a custom scanner (not trivy) as well. Can you please share the common behavior scanner spec?

heww commented 2 years ago

The default time out for the scan job is 30min, the scanner must return the scan report in 30min otherwise harbor will trust the scan job as failed.

https://github.com/goharbor/harbor/blob/main/src/pkg/scan/job.go#L55 Here is the time out for the scan job.

BTW, what's the size of your image? How long will the scanner take to pull it from the registry?

danielpacak commented 2 years ago

As pointed by @wy65701436 for Trivy scanner adapter you can update the timeout by setting the SCANNER_TRIVY_TIMEOUT environment variable. It defaults to 5 min, so even if Harbor has 30 min timeout set for a scan job, the adapter service may fail sooner.

As a quick fix / workaround one can set the SCANNER_TRIVY_TIMEOUT variable in his deployment descriptor or env file read by Docker compose.

@EvgenyAGRO The Pluggable Scanners API does not dictate a set of configuration parameters that each scanner adapter must provide. The implementation and documentation is left for security vendor. Some vendors may not even support timeouts. For Trivy all configuration options are listed on https://github.com/aquasecurity/harbor-scanner-trivy#configuration

After all, I think it makes sens to add the timeout parameter to the trivy config in harbor.yml and similar Helm chart values. WDYT?

# Trivy configuration
trivy:
  # ignoreUnfixed The flag to display only fixed vulnerabilities
  ignore_unfixed: false
  # skipUpdate The flag to enable or disable Trivy DB downloads from GitHub
  skip_update: false
  # insecure The flag to skip verifying registry certificate
  insecure: false
  # timeout The duration to wait for scan completion
  timeout: 5m                   # <--- NEW PARAM
EvgenyAGRO commented 2 years ago

@heww the image size is not particularly big, the problem is that i have limited scanning resources, so I keep an internal queue of arriving requests and until I reach some of them, time limit passes and harbor counts them as timed out. Ideally, for example in case of "Scan All", I would want Harbor to send requests in batches and wait for responses between batches, meaning only after I handled a certain batch (responded with success/failures) the next one will be sent, this way it would be predictable in terms of load. Right now, Harbor just sends everything in batches, in which case I can't handle the load (it takes more than 30 minutes to reach some requests) and it causes timeouts. If I also can't increase the global timeout (as you said its not configurable) and in my case I also can't auto scale my scanning resources, how can I handle that?

EvgenyAGRO commented 2 years ago

@danielpacak Not sure I understand what you've meant here:

As a quick fix / workaround one can set the SCANNER_TRIVY_TIMEOUT variable in his deployment descriptor or env file read by Docker compose.

Should it be added when deploying Harbor itself? Unfortunately I don't always have control of Harbor deployment.

Also not sure regarding this:

After all, I think it makes sens to add the timeout parameter to the trivy config in harbor.yml and similar Helm chart values. WDYT?

You said that the implementation and documentation is left for security vendor, I'm still trying to understand what it takes to support longer timeout without using trivy scanner. Suppose I have custom scanner and I want longer timeout, how is it done? Do I need to develop some adapter that will be loaded as part of Harbor deployment? And then my scanner will support additional configurations through that adapter. Not sure I understand how it should work. Is it documented anywhere?

danielpacak commented 2 years ago

@EvgenyAGRO If you don't control the deployment of Harbor you won't be able to set the SCANNER_TRIVY_TIMEOUT environment variable.

The only way to integrate a custom scanner with Harbor registry is by developing an adapter service (exposing JSON over HTTP endpoints) and configure it under Interrogation Service in Harbor UI. It's pretty well documented in Harbor docs and the Pluggable Scanners API I already referenced. Each adapter service has its own configuration. In particular it may configure scan timeout.

Check out existing scanner adapters to understand how the integration works:

EvgenyAGRO commented 2 years ago

@danielpacak Maybe I was not clear, I already implemented a scanner according to the spec and added it in Harbor UI. I have a scanner metadata, scan and report endpoints, everything is working. My question was how do I extend the builtin 30 min harbor timeout on Harbor side? Is it something that is configurable using scanner metadata endpoint? The only timeout configuration i found is Retry-Timeout setting, the rate at which Harbor will retry Report endpoint.

danielpacak commented 2 years ago

@danielpacak Maybe I was not clear, I already implemented a scanner according to the spec and added it in Harbor UI. I have a scanner metadata, scan and report endpoints, everything is working. My question was how do I extend the builtin 30 min harbor timeout on Harbor side? Is it something that is configurable using scanner metadata endpoint? The only timeout configuration i found is Retry-Timeout setting, the rate at which Harbor will retry Report endpoint.

Yeah, that was kind of confusing, especially that this issue is labeled with scanner/trivy and I assumed that @ShabiDabi referred to the default vulnerability scanner, which is Trivy.

Anyway, @heww already mentioned that the timeout of a scan job in Harbor is hardcoded to 30 mins (see https://github.com/goharbor/harbor/blob/main/src/pkg/scan/job.go#L55) Therefore, it's not configurable and we do not pass it in a scan request to the scanner adapter service.

EvgenyAGRO commented 2 years ago

@danielpacak Thanks for the answers, appreciate your patience. But nevertheless you somehow managed to control it using Trivy, is it because Trivy has some special integration with Harbor? I'm wondering what it takes for a custom scanner to support this as well?

ShabiDabi commented 2 years ago

Hi @danielpacak, @EvgenyAGRO and I are working on the same use case, for a custom scanner that doesn't use Trivy. So I'm joining @EvgenyAGRO's question about what does it take to add such configuration? Is it something that we can ask Harbor to add?

danielpacak commented 2 years ago

There is no special integration between Harbor and Trivy. You can check the code at https://github.com/aquasecurity/harbor-scanner-trivy and see how we use SCANNER_TRIVY_TIMEOUT environment variable throughout the code.

EvgenyAGRO commented 2 years ago

@danielpacak Sorry for bugging you with this. Just really trying to understand what it takes to support this. From my understanding I first need to control Harbor deployment, as I see that its's possible to pass configurable parameters (https://github.com/goharbor/harbor-helm#configuration) when deploying Harbor such as trivy.enabled:

helm install harbor harbor/harbor \
  --create-namespace \
  --namespace harbor \
  --set clair.enabled=false \
  --set trivy.enabled=true

I also need my scanner to be acknowledged by Harbor? To add a section with my custom parameters? I checked the code at https://github.com/aquasecurity/harbor-scanner-trivy, but eventually something on Harbor side needs to honor env variables you are returning in scanner metadata:

        "env.SCANNER_TRIVY_TIMEOUT":        h.config.Trivy.Timeout.String(),

So there is obviously something that need to be added also on Harbor side right?

danielpacak commented 2 years ago

No worries @EvgenyAGRO We'll sort it out.

Hardcoded 30 min timeout in Harbor is completely independent of timeout that you can configure for harbor-scanner-trivy via SCANNER_TRIVY_TIMOUT env, which defaults to 5 min. That's why my initial answer was to update SCANNER_TRIVY_TIMEOUT from 5 min to 30 min to give Trivy more time to scan. Otherwise, even if Harbor waited 30 mins some scan requests would failed just after 5 mins.

What you probably have in mind is to let users configure scan job timeout in Harbor (core / job service), which is currently not possible because it's hardcoded in https://github.com/goharbor/harbor/blob/main/src/pkg/scan/job.go#L55 In other words, you don't want to implement any timeout logic in your adapter service and let Harbor control scan job timeouts in one place. Is that correct?

EvgenyAGRO commented 2 years ago

@danielpacak Oh I see.. I misunderstood the meaning of this parameter. Yes that is correct. But I now understand that it's currently not possible to extend the default timeout.

EvgenyAGRO commented 2 years ago

@danielpacak Is it something that might be considered in future releases? It would be nice if it could be configurable in scanner metadata (or per scan request), as there might be slower scanners/very big images/unusual peaks during scanning which can cause slower request handling. So I think it make sense it to be configurable. WDYT?

danielpacak commented 2 years ago

For me it makes perfect sense to let installers configure scan job timeout one day. @heww WDYT?

ShabiDabi commented 2 years ago

For me it makes perfect sense to let installers configure scan job timeout one day. @heww WDYT?

This can be very helpful for us, we will appreciate it if you could allow such configuration

drehpehs commented 2 years ago

@danielpacak Hi I just noticed this issue recently. Currently our image scanning takes quote long time to finish it. #16630

We are using trivy as our scanner now, I would like to know if we add SCANNER_TRIVY_TIMEOUT and set it 5 mins or less. Does it mean the image scan will be failed when it is over the time and take the next one? I just wonder can it speed up the image scanning (but the number of Error scan could increase?) Thank you.

danielpacak commented 2 years ago

In case of timeout Harbor will automatically retry a failed job. You can also trigger rescan manually through Harbor portal / UI.

When it comes to setting the value of the timeout (in Trivy), it really depends on the contents stored in the repository and compute resources of the machine(s) where Harbor is installed. 5 mins is usually enough, but we've seen giant container images, > 3GB, that require increasing the timeout duration accordingly.

Typically, you'd monitor scan duration metrics and tune the duration appropriately.

github-actions[bot] commented 2 years ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.