Closed dioguerra closed 12 months ago
@dioguerra To your question:
First and foremost, what are the projects that the scanner triggers the scan jobs?
The answer is
2. ScanAll command scans all artifacts independent of option selected by user?
To your question:
while new pushed images are being actively scanned, the daily scanner does not pick up the Images with a Vulnerabilities error state: Even with Automatically scan images on push enabled.
The daily scanner (scanAll) runs independently of Automatically scan images on push
. If it doesn't pick up the images with vulnerabilities, it may have a reason:
Yes. 1. was fixed by using stateless .cache (i dropped the volumeMount manually from the statefulset).
For 2, i cannot do this as doing so and activating some sort of global scanning would start schedulling alot of jobs which would affect (sometimes halting) normal registry operation. Our experience shows that any overload of the jobService runners will affect registry image serving operations. https://github.com/goharbor/harbor/issues/17607
Do you have any other recommendations?
Note: I still need to validate this thread which seems to schedule scan jobs for the retrieved artifacts. Maybe there is something filtering happening here? https://github.com/goharbor/harbor/blob/f21b1481bb5ba3efb9e3c1dd8c4e704d9dcc44a1/src/controller/scan/base_controller.go#L387
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
Still happening. The automatically daily scan job keeps scanning random number of images. See below:
I also notice that scan_reports with no vulnerabilties have null
reports (also report_vulnerability_record is empty):
SELECT id, uuid, digest, registration_uuid, mime_type, report
FROM public.scan_report
WHERE report->>'generated_at' Like '2023-06-30%'
ORDER BY id DESC;
shows 112 entries, althout most probably this also account recently pushed images that where pushed to projects with scan on push
option enabled. If i limit this to the autoscan start time (midnight) i see the number drop to 39 reports ( '2023-06-30T00%'). How is the Vulnerability Scan All report Generated?
Continuing from https://github.com/goharbor/harbor/issues/18455#issue-1647983689
The Scan is created with the ExecutionTriggerSchedule
associated executionID, Since the last time I posted here, my artifact count went up. So for each artifact a scan call is created.
SELECT count(id)
FROM public.artifact;
31094
Inside the scan call we do some things:
At this point the artifacts
variable should hold all artifact(s multiple if referenced) that are not Accessories or ImageIndex
artifacts
launchScanJobParam.ExecutionID
Observation. Shouldn't this be:
supported := hasCapability(r, a)
if supported {
artifacts = append(artifacts, a) // If the artifact is not supported why do we add it to the list of artifacts?
scannable = true
return ar.ErrSkip // this artifact supported by the scanner, skip to walk its children
}
// What reason would require to add all other artifacts not supported by the scanner? Just to add a `not supported`?, we could add from here
So, image indexes are not supported by trivy, if we filter the remainder artifacts we get:
SELECT count(id)
FROM public.artifact
WHERE manifest_media_type IN ('application/vnd.oci.image.manifest.v1+json', 'application/vnd.docker.distribution.manifest.v2+json');
27445
I think this might also be causing The ScanAll report to be incorrectly aggregated? Does it make sense to exit? Shouldn't all reports be aggregated?
for _, group := range groupReports {
if len(group) != 0 {
reports = append(reports, group...)
}
// else {
// // NOTE: If the artifact is OCI image, this happened when the artifact is not scanned,
// // but its children artifacts may scanned so return empty report
// return nil, nil
// }
}
I'm seeing some errors when trivy tries to pull images from private repositories when triggered with scan_all
(manually)?
It also seems that in version v2.7.x the job count is bigger (not sure this is due to the fact that trivy fails because s3 data is not there)
Does this fail because of some sort of timeout?
It seems that this problem is fixed in the version v2.7.x? I will need to actively scan the images to make sure:
This is how i'm validating.
2023-08-17T00:21:56Z [DEBUG] [/pkg/scan/job.go:376]: registration:
2023-08-17T00:21:56Z [INFO] [/pkg/scan/job.go:387]: {
"uuid": "6809b473-11fb-11eb-93ee-e6a720a2df22",
"name": "Trivy",
"description": "The Trivy scanner adapter",
"url": "http://something:8080",
"disabled": false,
"is_default": true,
"health": "healthy",
"auth": "",
"access_credential": "[HIDDEN]",
"skip_certVerify": false,
"use_internal_addr": true,
"adapter": "Trivy",
"vendor": "Aqua Security",
"version": "v0.40.0",
"create_time": "2020-10-19T13:08:24.486231Z",
"update_time": "2023-08-16T11:50:05.497478Z"
}
2023-08-17T00:21:56Z [DEBUG] [/pkg/scan/job.go:376]: scanRequest:
2023-08-17T00:21:56Z [INFO] [/pkg/scan/job.go:387]: {
"registry": {
"url": "http://something:80",
"authorization": "[HIDDEN]"
},
"artifact": {
"namespace_id": 19,
"repository": "dtomasgu/gg/server",
"tag": "latest",
"digest": "sha256:ff6e6d5e2235adf383e184584a2bf9e881015f7ac49c9844bf2d505a674951ec",
"mime_type": "application/vnd.docker.distribution.manifest.v2+json"
}
}
2023-08-17T00:21:56Z [INFO] [/pkg/scan/job.go:167]: Report mime types: [application/vnd.security.vulnerability.report; version=1.1]
2023-08-17T00:21:56Z [INFO] [/pkg/scan/job.go:224]: Get report for mime type: application/vnd.security.vulnerability.report; version=1.1
2023-08-17T00:21:58Z [DEBUG] [/pkg/scan/job.go:237]: check scan report for mime application/vnd.security.vulnerability.report; version=1.1 at 2023/08/17 00:21:58
2023-08-17T00:21:58Z [INFO] [/pkg/scan/job.go:245]: Report with mime type application/vnd.security.vulnerability.report; version=1.1 is not ready yet, retry after 5 seconds
2023-08-17T00:22:03Z [DEBUG] [/pkg/scan/job.go:237]: check scan report for mime application/vnd.security.vulnerability.report; version=1.1 at 2023/08/17 00:22:03
2023-08-17T00:22:03Z [INFO] [/pkg/scan/job.go:245]: Report with mime type application/vnd.security.vulnerability.report; version=1.1 is not ready yet, retry after 5 seconds
2023-08-17T00:22:08Z [DEBUG] [/pkg/scan/job.go:237]: check scan report for mime application/vnd.security.vulnerability.report; version=1.1 at 2023/08/17 00:22:08
2023-08-17T00:22:08Z [INFO] [/pkg/scan/job.go:245]: Report with mime type application/vnd.security.vulnerability.report; version=1.1 is not ready yet, retry after 5 seconds
2023-08-17T00:22:13Z [DEBUG] [/pkg/scan/job.go:237]: check scan report for mime application/vnd.security.vulnerability.report; version=1.1 at 2023/08/17 00:22:13
2023-08-17T00:22:13Z [ERROR] [/pkg/scan/job.go:294]: check scan report with mime type application/vnd.security.vulnerability.report; version=1.1: running trivy wrapper: running trivy: exit status 1: 2023-08-17T00:22:10.256Z [34mINFO[0m Vulnerability scanning is enabled
2023-08-17T00:22:10.347Z [31mFATAL[0m image scan error: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 5 errors occurred:
* unable to inspect the image (something:80/dtomasgu/gg/server@sha256:ff6e6d5e2235adf383e184584a2bf9e881015f7ac49c9844bf2d505a674951ec): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
* unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
* containerd socket not found: /run/containerd/containerd.sock
* GET http://something:80/v2/dtomasgu/gg/server/manifests/sha256:ff6e6d5e2235adf383e184584a2bf9e881015f7ac49c9844bf2d505a674951ec: MANIFEST_UNKNOWN: manifest unknown; map[Name:dtomasgu/gg/server Revision:sha256:ff6e6d5e2235adf383e184584a2bf9e881015f7ac49c9844bf2d505a674951ec]
* GET http://something:80/v2/dtomasgu/gg/server/manifests/sha256:ff6e6d5e2235adf383e184584a2bf9e881015f7ac49c9844bf2d505a674951ec: MANIFEST_UNKNOWN: manifest unknown; map[Name:dtomasgu/gg/server Revision:sha256:ff6e6d5e2235adf383e184584a2bf9e881015f7ac49c9844bf2d505a674951ec]
: general response handler: unexpected status code: 500, expected: 200
Here are some information on the scan jobs:
The database (which is not changing) contains 32255 artifacts:
```sql
SELECT count(*)
FROM public.artifact
From this artifacts, 28639 should be scannable by the trivy scan:
SELECT count(*)
FROM public.artifact AS a
LEFT JOIN public.scan_report AS sr
ON a.digest = sr.digest
WHERE
a.manifest_media_type IN ('application/vnd.oci.image.manifest.v1+json', 'application/vnd.docker.distribution.manifest.v2+json')
Below its the last SCAN_ALL triggered (manually and schedule in the following order):
SELECT id, vendor_type, vendor_id, status, status_message, trigger, extra_attrs, start_time, end_time, revision, update_time
FROM public.execution
WHERE vendor_type = 'SCAN_ALL'
ORDER BY id DESC
LIMIT 1000;
Harbor v2.7 - automatic scan Harbor v2.7 - manual scan Harbor v2.5 - automatic scan Harbor v2.5 - automatic scan Harbor v2.5 - automatic scan
7148930 "SCAN_ALL" 0 "Running" "SCHEDULE" "{""summary"":{""total_count"":24745,""submit_count"":18727,""conflict_count"":945,""precondition_count"":0,""unsupport_count"":5073,""unknow_count"":0}}" "2023-08-17 00:00:01.962214" 60360 "2023-08-17 09:02:32"
7148901 "SCAN_ALL" 0 "Error" "MANUAL" "{""summary"":{""total_count"":24855,""submit_count"":5508,""conflict_count"":408,""precondition_count"":0,""unsupport_count"":18939,""unknow_count"":0}}" "2023-08-16 13:44:17.284105" "2023-08-16 17:13:56" 16613 "2023-08-16 17:13:56"
7148727 "SCAN_ALL" 0 "Error" "SCHEDULE" "{""summary"":{""total_count"":1996,""submit_count"":1898,""conflict_count"":35,""precondition_count"":0,""unsupport_count"":61,""unknow_count"":2}}" "2023-08-10 00:00:01.754736" "2023-08-10 00:58:43" 4153 "2023-08-10 00:58:43"
7144700 "SCAN_ALL" 0 "Error" "SCHEDULE" "{""summary"":{""total_count"":147,""submit_count"":129,""conflict_count"":0,""precondition_count"":0,""unsupport_count"":17,""unknow_count"":1}}" "2023-08-09 00:00:03.366405" "2023-08-09 00:02:22" 260 "2023-08-09 00:02:22"
7140768 "SCAN_ALL" 0 "Error" "SCHEDULE" "{""summary"":{""total_count"":1339,""submit_count"":1276,""conflict_count"":31,""precondition_count"":0,""unsupport_count"":32,""unknow_count"":0}}" "2023-08-08 00:00:03.142852" "2023-08-08 00:39:22" 2835 "2023-08-08 00:39:22"
As we can observe, the total_count approaches the artifacts that trivy supports (~25k vs ~29k). Although I would expect the total_count to be equal to the total artifacts and the submit count equal to the artifacts scanned by trivy.
I will try and test patch https://github.com/goharbor/harbor/pull/18931 and https://github.com/goharbor/harbor/pull/18943 to see the difference
It seems that total_count seems much more consistent but even with the patches the reported scans show only a fraction of the total artifacts.
In the table below, the last 2 reports where ran with the core v2.7 with the respective patches above
7148968 "SCAN_ALL" 0 "Error" "SCHEDULE" "{""summary"":{""total_count"":24682,""submit_count"":2658,""conflict_count"":84,""precondition_count"":0,""unsupport_count"":21940,""unknow_count"":0}}" "2023-08-18 00:00:02.970861" "2023-08-18 00:55:58" 5864 "2023-08-18 00:55:59.049953"
7148942 "SCAN_ALL" 0 "Error" "MANUAL" "{""summary"":{""total_count"":24745,""submit_count"":2761,""conflict_count"":89,""precondition_count"":0,""unsupport_count"":21895,""unknow_count"":0}}" "2023-08-17 16:29:48.349199" "2023-08-17 17:28:37" 6106 "2023-08-17 17:28:37"
7148941 "SCAN_ALL" 0 "Error" "MANUAL" "{""summary"":{""total_count"":24745,""submit_count"":10284,""conflict_count"":110,""precondition_count"":0,""unsupport_count"":14351,""unknow_count"":0}}" "2023-08-17 09:13:46.026438" "2023-08-17 13:14:01" 26974 "2023-08-17 13:14:01"
7148930 "SCAN_ALL" 0 "Error" "SCHEDULE" "{""summary"":{""total_count"":24745,""submit_count"":18727,""conflict_count"":945,""precondition_count"":0,""unsupport_count"":5073,""unknow_count"":0}}" "2023-08-17 00:00:01.962214" "2023-08-17 09:13:17" 61668 "2023-08-17 09:13:17"
7148901 "SCAN_ALL" 0 "Error" "MANUAL" "{""summary"":{""total_count"":24855,""submit_count"":5508,""conflict_count"":408,""precondition_count"":0,""unsupport_count"":18939,""unknow_count"":0}}" "2023-08-16 13:44:17.284105" "2023-08-16 17:13:56" 16613 "2023-08-16 17:13:56"
NOTE: I disabled the GC but it seems that still some artifacts got dropped? (24745 -> 24682)
Still this dosen't tell us nothing of what is failing. Checking the tasks for the currect execution id presents the same number of tasks and reports presented in the scan_all dashboard results (as we had confirmed before).
SELECT *
FROM public.task
WHERE execution_id = 7148968
Is this an issue submitting the task jobs from the SCAN_ALL artifact loop? Also, a last observation between the submit_count and the actual tasks assigned to the (latest) SCAN_ALL execution id they dont match:
You might say that this is because of the patches? But the same still happens with the previous original v2.7 version: For
7148901 "SCAN_ALL" 0 "Error" "MANUAL" "{""summary"":{""total_count"":24855,""submit_count"":5508,""conflict_count"":408,""precondition_count"":0,""unsupport_count"":18939,""unknow_count"":0}}" "2023-08-16 13:44:17.284105" "2023-08-16 17:13:56" 16613 "2023-08-16 17:13:56"
And for
7148930 "SCAN_ALL" 0 "Running" "SCHEDULE" "{""summary"":{""total_count"":24745,""submit_count"":18727,""conflict_count"":945,""precondition_count"":0,""unsupport_count"":5073,""unknow_count"":0}}" "2023-08-17 00:00:01.962214" 60360 "2023-08-17 09:02:32"
which i stopped as the total scheduled tasks where not increasing.
My logs show 22k occurences of this error https://github.com/goharbor/harbor/blob/f21b1481bb5ba3efb9e3c1dd8c4e704d9dcc44a1/src/controller/scan/base_controller.go#L392 for the latest scan
which seems about right. Althought the full error is
2023-08-18T00:55:59Z [ERROR] [/controller/scan/base_controller.go:391]: failed to scan artifact someartifact@sha, error the configured scanner Trivy does not support scanning artifact with mime type application/vnd.oci.image.manifest.v1+json
This is totally wrong. So what is the error exactly?
This section should check if err != nil? before printing error log https://github.com/goharbor/harbor/blob/f21b1481bb5ba3efb9e3c1dd8c4e704d9dcc44a1/src/controller/scan/base_controller.go#L359
Are there problems with caching the project metadata?
2023-08-18T16:00:35Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1778 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:36Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1778 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:37Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1778 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:38Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1009 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:43Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 9 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:44Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 9 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:45Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1009 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:46Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 3149 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:47Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 9 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:48Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 9 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:49Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 9 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:50Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 9 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:51Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1009 metadata from cache error: key not found:redis: nil, will query from database.
2023-08-18T16:00:52Z [DEBUG] [/pkg/cached/project_metadata/redis/manager.go:85]: get project 1009 metadata from cache error: key not found:redis: nil, will query from database.
I think the error comes from here. It points to a context timeout.
2023-08-18T00:55:59Z [ERROR] [/controller/scanner/base_controller.go:299][error="v1 client: get metadata: Get "http://trivy:8080/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"]: failed to ping scanner
2023-08-18T00:55:59Z [ERROR] [/controller/scanner/base_controller.go:265]: api controller: get project scanner: scanner controller: ping: v1 client: get metadata: Get "trivy:8080/api/v1/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[299] points to https://github.com/goharbor/harbor/blob/main/src/controller/scanner/base_controller.go#L301 [266] points to https://github.com/goharbor/harbor/blob/main/src/controller/scanner/base_controller.go#L265C2-L265C30
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.
This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.
Edit: Harbor Version v2.5.2 -- I know i know, update but please, read on...
So, I have a Daily ScanAll Job activated, which does not seem to be scanning all Objects. I did some investigation on the source code on my own, but seem to arrive to no conclusion. First and foremost, what are the projects that the scanner triggers the scan jobs?
Automatically scan images on push
option enabled?Personally I am expecting that 1. happens. Documentation is not explicit on this: https://goharbor.io/docs/2.7.0/administration/vulnerability-scanning/scan-all-artifacts/
A while back, our production image CVE scanner was not working https://github.com/aquasecurity/trivy/issues/3894 (fixed now) and while new pushed images are being actively scanned, the daily scanner does not pick up the Images with a
Vulnerabilities
error state:Even with
Automatically scan images on push
enabled. In this case, this is a proxy-cache repository.So, if we check the
Interrogation Service -> Vulnerabilities
I see ~ 2000 images scanned.Now, for the funzies part. I went through the src code starting with the scan_all.createOrUpdateScanAllSchedule call and arrived somewhere to:
nil
(as so are the options), so a new Query is instanciated with no filter parameters (only the pager) - Nothing much to seeFrom what I managed to find out there dosent seem to be any filter for a specific artifact type, which I find quite suspitious? I would expect the scan to use the manifest artifact and then the scanner itself(trivy) would pull all the dependent layers for scan. The vulnerabilities would then be aggregated and the vulnerability report would match the manifest only:
Here is an example of a query where one image is scanned and other is not:
NOTE: If the image first
artifact
is an index, it seems that the scanner reports are tied to the underlaying manifests. So I infer that this should always be the case.So i have two cases. But even tho they are in the same repository, same image, the daily scan will not pick it up. I would like to know what kind of query is made here, so I can reproduce how many images
would be scanned
VSshould be scanned
. It seems that some filtering is applyed but i didn't manage to pick it up.For consideration, This are my total artifact numbers:
AND
Note: number of artifacts obtained with commented parameter very similar
Can you help?