anchore / grype

A vulnerability scanner for container images and filesystems
Apache License 2.0
8.75k stars 570 forks source link

Duplicate CVE due to --by-cve lacks FIXED-IN data #1202

Open ghost opened 1 year ago

ghost commented 1 year ago

What happened:

When using --by-cve the same CVE is reported twice. Once with FIXED-IN data, and once without.

What you expected to happen:

I expect the same CVE to be reported just once, while including the FIXED-IN data.

How to reproduce it (as minimally and precisely as possible):

grype docker.io/s3curitybug/jackson-databind-cves --by-cve | grep CVE-2020-9547
 ✔ Vulnerability DB        [no update available]
 ✔ Loaded image            
 ✔ Parsed image            
 ✔ Cataloged packages      [306 packages]
 ✔ Scanning image...       [925 vulnerabilities]
   ├── 109 critical, 246 high, 319 medium, 112 low, 132 negligible (7 unknown)
   └── 621 fixed
jackson-databind          2.8.8                                           java-archive  CVE-2020-9547     Critical    
jackson-databind          2.8.8                2.9.10.4                   java-archive  CVE-2020-9547     Critical 

As you can see above, the fixed in version (2.9.10.4) is reported for this CVE only in the second occurrence.

Anything else we need to know?:

Thanks for fixing the performance issues in https://github.com/anchore/grype/issues/1185.

Environment:

tgerla commented 1 year ago

Hi @JipSogeti, thanks for the report. The reason you're getting two rows for this CVE is because it is matched from both the NVD database and the GHSA. The NVD result doesn't have the fixed version but GHSA does. Our fix will probably be to merge these two rows together which should give you the results you expect. We'll put this in our issue backlog and get to it when we can. Thanks again.

ghost commented 1 year ago

Thanks! That sounds good.

ghost commented 1 year ago

In the mean time there's this python snippet to filter out the duplicate CVEs and keep only the one where fix state is not unknown.

import sys
import json

data = json.load(sys.stdin)
unique_matches = {}

for match in data['matches']:
    # Use vulnerability ID and artifact data as unique key
    key = f"{match['vulnerability']['id']}_{json.dumps(match['artifact'], sort_keys=True)}"
    unique_match = unique_matches.get(key)

    if (not unique_match or     # This match is new (no unique_match)
        (unique_match and       # This match is a duplicate (same vulnerability ID and targets the same artifact)
                                # Check if fix state needs updating
         unique_match['vulnerability']['fix']['state'] == 'unknown' and
            match['vulnerability']['fix']['state'] != 'unknown')):

        # Add match (or overwrite unique_match)
        unique_matches[key] = match

data['matches'] = list(unique_matches.values())

print(json.dumps(data, sort_keys=True, indent=2))

Original grype JSON output (shows duplicate CVE with fixed and unkown fixed state :

grype docker.io/s3curitybug/jackson-databind-cves --by-cve -o json | jq '.matches[].vulnerability | (.id + " | " + .fix.state)' | grep CVE-2020-9547
 ✔ Vulnerability DB        [no update available]
 ✔ Loaded image            
 ✔ Parsed image            
 ✔ Cataloged packages      [306 packages]
 ✔ Scanning image...       [980 vulnerabilities]
   ├── 104 critical, 249 high, 337 medium, 120 low, 153 negligible (17 unknown)
   └── 645 fixed
"CVE-2020-9547 | fixed"
"CVE-2020-9547 | unknown"

Filtered output (shows only the CVE with known fixed state):

grype docker.io/s3curitybug/jackson-databind-cves --by-cve -o json | python3 dedup.py | jq '.matches[].vulnerability | (.id + " | " + .fix.state)' | grep CVE-2020-9547
 ✔ Vulnerability DB        [no update available]
 ✔ Loaded image            
 ✔ Parsed image            
 ✔ Cataloged packages      [306 packages]
 ✔ Scanning image...       [980 vulnerabilities]
   ├── 104 critical, 249 high, 337 medium, 120 low, 153 negligible (17 unknown)
   └── 645 fixed
"CVE-2020-9547 | fixed"