fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.92k stars 409 forks source link

False positive in vulnerability processing. Amazon Linux #20934

Closed qwerty1q2w closed 6 days ago

qwerty1q2w commented 1 month ago

Fleet version: <!-- Copy this from the "My account" page in the Fleet UI, or run fleetctl --version --> Fleet 4.53.1 • Go go1.22.4 Web browser and operating system: Firefox and Ubuntu 22.04


💥  Actual behavior

Host is fully upgraded but FleetDM shows strange results.

image

cat /etc/os-release NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/" SUPPORT_END="2025-06-30"

amazon_vulns.csv

🧑‍💻  Steps to reproduce

  1. Connect Amazon Linux host to FleetDM
  2. Check vulnerabilities
lukeheath commented 1 month ago

@qwerty1q2w Thank you for filing this bug! We will reproduce on our end and prioritize a fix.

sharon-fdm commented 1 month ago

Hey team! Please add your planning poker estimate with Zenhub @getvictor @lucasmrod @mostlikelee

iansltx commented 1 month ago

I'm running into a (pretty sure unrelated) panic on pulling vuln information within the LoadCVEMeta function when attempting to repro this (by running ./build/fleet vuln_processing --dev --mysql_address=127.0.0.1:3310 --dev-license against the docker dev setup), but I was able to confirm that Amazon Linux does show up as itself in e.g. the DB (operating_systems table):

+----+--------------+----------+--------+-----------------+----------+-----------------+---------------+
| id | name         | version  | arch   | kernel_version  | platform | display_version | os_version_id |
+----+--------------+----------+--------+-----------------+----------+-----------------+---------------+
|  1 | Amazon Linux | 2.0.0    | x86_64 | 6.6.26-linuxkit | amzn     |                 |             1 |
|  2 | Amazon Linux | 2023.0.0 | x86_64 | 6.6.26-linuxkit | amzn     |                 |             2 |
+----+--------------+----------+--------+-----------------+----------+-----------------+---------------+

Checked with both AL2 (x86_64 and aarch64) and AL2023 (x8664 only) as, thanks to the [Docker Library images](https://hub.docker.com//amazonlinux), base versions are easy enough to pull (and while those are slim images I expect we'll be able to repro at lest some of the CVEs from them, though I'm assuming this was observed on a full EC2 instance rather than inside a container).

Will update here once my local env isn't throwing spurious errors and I'm able to repro.

iansltx commented 1 month ago

I'm able to repro this; Amazon Linux 2 in Docker gets a bunch of vulnerabilities reported, while Amazon Linux 2023 doesn't since there's nothing mapping it.

My working hypothesis is that AWS has OVALs for Amazon Linux, and we can use those to get correct vuln info, given that they will have backported security fixes to the stuff they bundle.

iansltx commented 1 month ago

Looks like https://vuls.io has figured out how to pull Amazon Linux data in https://github.com/vulsio/goval-dictionary, so we should be able to adapt that for versions of AL up through 2023.

By the way, @qwerty1q2w thanks for the CSV output! What tool did you use to pull that? We should (and don't) have that available via our API and/or UI.

iansltx commented 1 month ago

After a bit more research here, we're definitely pulling an insufficient feed, as Amazon has their own feed (ALAS/Amazon Linux Security Center), in their own format. Each ALAS entry references one or more CVEs, and includes lists of packages that were updated to resolve those vulnerabilities; it does appear that Amazon is backporting fixes to versions that standard RHEL OVALs would mark as vulnerable.

My initial thought here is that we'll want to stop using the OVAL method to validate Amazon Linux vulnerabilities (and notify customers that they'll stop getting vuln info for AL2 boxes, as that information wasn't precise anyway), then ingest ALAS information into our vulnerabilities artifacts, if we can get CPEs out of the information in ALAS entries. If we can take that route, we can clean up the process for everyone pulling from our vulnerabilities feed (and our OVAL mapping).

If we can't go from ALASes to CPEs, that probably leaves us with needing to add a custom vuln check method, which would do something similar to the fetch in goval-dictionary and then compare versions from there. We may still want to either preprocess the ALAS feed or have a file in nvd that maps to the correct URLs for various AL versions as those apparently may change.

Now, we might want to keep the RHEL 7 OVAL link in place rather than dropping it if dropping it qualifies as "failing open". There's just going to be a bunch of noise in the vuln feed for any AL2 hosts until we get this fixed. Also of note, we don't have an OS mapping at all for AL2023 (nor AL1, but hopefully no one's using that anymore).

Either way, getting this fixed the right way will be nontrivial; I'm guessing the fix would be of similar order of magnitude to #9386, though with effort in different locations.

qwerty1q2w commented 1 month ago

@iansltx It is API query, and then I added the actual_HOST_Version from the host.

https://fleetdm.com/docs/rest-api/rest-api#list-hosts https://fleetdm.com/docs/rest-api/rest-api#get-host


        host_response = requests.get(f'{base_url}/fleet/hosts/{host_id}', headers=headers)
        host_data = host_response.json()
        for software_item in host_data.get('host', {}).get('software', []):
            if software_item.get('vulnerabilities'):
                for vulnerability in software_item['vulnerabilities']:
lucasmrod commented 1 month ago

@iansltx Thanks for the analysis. I agree with your points.

I agree that we'll need to do this in two stages:

  1. First exclude Amazon Linux from vulnerability processing altogether and declare/document it as unsupported. (Which will get rid of false positives and make it clear the platform is still not supported.). We should do this part this sprint IMO.
  2. Second, plan/estimate the level of effort to get and process vulnerabilities for Amazon Linux hosts using ALAS (https://alas.aws.amazon.com/alas.rss). Am guessing we could do some pre-processing of ALAS in https://github.com/fleetdm/nvd or https://github.com/fleetdm/vulnerabilities and then have Fleet pull that pre-processed information and run vulnerability processing on the hosts.

Pinging @noahtalerman / @sharon-fdm because this will require some discussion with the product team.

iansltx commented 1 month ago

Added https://github.com/fleetdm/nvd/pull/33 for item 1 on the above.

iansltx commented 1 month ago

Just talked through this with @sharon-fdm. Technically we have three options: the two Lucas outlined (either in the order he outlined, or delaying the first until closer to when the second is complete), plus "add one-off exclusions for each of these versions manually".

The third option isn't sustainable since we're talking about a whole OS worth of dependencies that Amazon has backports for that don't correspond to upstream RHEL.

That leaves us with questions of:

  1. When do we drop AL2 from our OVAL URL mapping?
  2. When do we prioritize pulling ALAS data?

Between me spinning up in general and this being a nontrivial fix, if this winds up on my plate (which I'm fine with), we're probably looking at a 13-pointer, but in return for that we should be able to not only fix AL2 but also add AL2023 support, as once we have feed parsing set up for ALAS 2023 is actually easier to pull than 2.

/cc @noahtalerman @sharon-fdm @mostlikelee

sharon-fdm commented 1 month ago

Thanks, @lucasmrod and @iansltx. I am in favor of doing it right. Either supporting Amazon Linux or not (not patching)

If Ian believes it's a 13 pointer I'm in favor of doing it. @noahtalerman it should be your call. We can do a parking lot in the coming standup.

sharon-fdm commented 1 month ago

From @noahtalerman : https://fleetdm.com/docs/using-fleet/vulnerability-processing

sharon-fdm commented 1 month ago

From @getvictor as reference:

21243

iansltx commented 3 weeks ago

Talked with @mostlikelee on Friday and the initial idea of merging ALAS data into the existing NVD (really NVD + vulncheck) feed would wind up being rather risky in terms of tracking down data sources for bugs. Keeping things separate, a la OVAL (with data only pulled for distros/versions actually in use) seems cleaner.

The current idea would be to:

  1. Run AL fetches for each version via goval-dictionary, in the vulnerabilities repo Actions workflow, outputting sqlite files (goval-dictionary can fetch multiple OSes/versions into a single database but for our artifact-based use case we want to keep each OS/version separate, same as OVALs)
  2. xz each sqlite file (e.g. AL2 is ~10MB unzipped, 2.2MB gz, 1.1MB xz) and include in the vulnerabilities release, plus a JSON mapping file similar to what we have for OVALs
  3. When pulling vulnerability source data on Fleet server, grab the relevant sqlite file(s) for AL, similar to OVAL lookups, but use the sqlite file directly for software matching rather than converting into an intermediate format

The big question is whether that sqlite file includes enough information to allow us to validate software, as the ALAS feed at least doesn't use CPEs. This is the next thing I'll be finding out. I can start with step 3 of the above, using manually generated sqlite files (uncompressed for now), and if that process works we can add the automations/XZ compression in steps 1 and 2.

The tradeoff with the above is that this will require more code on the Fleet server side to ingest/compare a new feed, which means that we won't be able to backport the new source to existing installs. This also means that it matters a lot more timing on when I can get this in, as an explicit release would be required. If this misses 4.56.0, question is whether this size of functionality should land in 4.56.1 or wait until 4.57.0.

This also brings up the question of what to do with everyone on older versions; do we:

  1. Pull the OVAL reference for amzn_02 (would remove vuln reporting entirely for AL2 for older versions)
  2. Replace it with something obvious like "CVE-UPGRADE-YOUR-FLEET`
  3. keep it pointed to maintain existing behavior, with a Fleet server-side rule to prefer the sqlite based sources to OVALs exist
iansltx commented 3 weeks ago

Looking at how OVAL pulls are implemented, we may need to keep the key in the nvd repo anyway, as the supported OS list is hard-coded in oval_platform.go. Need to check what happens if on OVAL pull we e.g. drop amzn_02 while there are still Fleet servers using the current logic with AL2 hosts in the wild.

iansltx commented 3 weeks ago

Ran through the OVAL download process with a tweaked fork of nvd and removing the OVAL association just spits out a few extra errors, skips a few debug-level logs, and effectively no-ops that particular OS version's vuln checking, but only that version (which is more or less what's desired).

So a valid deprecation strategy for using the wrong OVALs would be to contact folks with AL2 hosts mentioning the upgrade, with a date for browning out that OVAL mapping entry for X hours of one day, with a full/permanent removal later.

iansltx commented 3 weeks ago

Confirmed that we already support download-and-extract for xz files as well as bz2/gz, so nothing stopping us from using that compression method here.

iansltx commented 3 weeks ago

Talked with @mostlikelee earlier today and it looks like having a workflow somewhat similar to what we do for OVALs will work here:

  1. Run goval-dictionary in the vulnerabilities repo to build sqlite DB(s) in its format, then XZ the file(s)
  2. Pull to local during the vuln processing job
  3. For each AL version, pull installed software and compare versions using https://github.com/fleetdm/fleet/blob/main/server/vulnerabilities/utils/rpmvercmp.go vs. the database of "this is the fixed version for this CVE"; lower versions get flagged, >= is clean. We can handle by arch as the feed lists by arch and that data gets included in the sqlite DB.

Note that we are not using CPEs here at all, just comparing Linux package versions, same as OVAL. We're also evaluating one OS-version combo at a time, same as OVAL. The difference to OVAL is that we'll be doing file processing in GitHub Actions rather than doing so locally on every machine.

Once I have things minimally working (as mentioned before, I can focus on step 3 for now), I can test with earlier Docker tags of AL2, containing vulnerable packages, to confirm that we get the correct true positives. The latest Docker tag should have a clean run with this method, for a false positive check.

After I get AL2 working, other AL versions available via Docker should be quick enough to add on, so they'll probably be in scope.


For goval-dictionary usage, my initial thought is we manually specify the most recent tag of goval-dictionary (currently v0.9.5) and pull directly from the upstream repo. If we want more control here we could fork into our own GitHub and do the same thing. The library is maintained well enough (and there may be opportunities to contribute upstream), but:

  1. Releases may be tagged less often than we'd like (though there's nothing later than v0.9.5 that would affect Amazon Linux fetching from what I can tell, and I've pinged the primary maintainer to see if they can tag a release...this might be 0.10.0 due to changes in the Redis format?)
  2. Since the generated sqlite files are going to be consumed directly by the Fleet server, we want to ensure any schema changes are BC with respect to the tables we touch

/cc @lukeheath on what posture we want to take here. I don't think we want to pull code into our codebase for this as it's actively maintained (see: why we might want something newer than 0.9.5). Also, we're only using the lib inside our vuln feed build rather than having it as a direct dep of fleet server at runtime.


For sqlite building, I need to see what the size difference is between "grab all AL versions in one DB file" vs. "grab each individually". Will post a comparison here shortly.

iansltx commented 3 weeks ago

Uncompressed:

-rw-r--r--    1 ian  admin    22M Aug 19 11:56 all.sqlite3
-rw-r--r--    1 ian  admin   7.1M Aug 19 11:59 al1.sqlite3
-rw-r--r--    1 ian  admin   9.8M Aug 19 11:59 al2.sqlite3
-rw-r--r--    1 ian  admin   3.4M Aug 19 12:02 al2023.sqlite3
-rw-r--r--    1 ian  admin   1.8M Aug 19 12:02 al2022.sqlite3

XZ'd:

-rw-r--r--    1 ian  admin   2.4M Aug 19 11:56 all.sqlite3.xz
-rw-r--r--    1 ian  admin   902K Aug 19 11:59 al1.sqlite3.xz
-rw-r--r--    1 ian  admin   1.1M Aug 19 11:59 al2.sqlite3.xz
-rw-r--r--    1 ian  admin   473K Aug 19 12:02 al2023.sqlite3.xz
-rw-r--r--    1 ian  admin   230K Aug 19 12:02 al2022.sqlite3.xz

We save a few hundred KB XZ'd by including everything in a single (compressed) DB, but that DB is larger than any two single DBs combined, and customers are likely not to be running enough versions to come out ahead here (they're either running AL2 and maybe 2022/2023 or just 2023 at this point). So sticking with one DB per version looks to be the way forward.

iansltx commented 3 weeks ago

Also, we may get a v0.10 tag of goval-dictionary; would be 0.10 rather than 0.9.6 because they changed the key structure for the Redis fetch output and data storage format appears to be in-scope for BC there (which is good for us, as that means we can trust semver-minor updates to not break sqlite exports).

lukeheath commented 3 weeks ago

/cc @lukeheath on what posture we want to take here. I don't think we want to pull code into our codebase for this as it's actively maintained (see: why we might want something newer than 0.9.5). Also, we're only using the lib inside our vuln feed build rather than having it as a direct dep of fleet server at runtime.

Up to @sharon-fdm

sharon-fdm commented 3 weeks ago

@iansltx, as agreed, let's review this together in your design review and ratify the plans/ answer questions.

iansltx commented 3 weeks ago

Attaching the goval-dictionary sqlite ERD for quick reference. It appears that we have enough info to take a simplified path that riffs off of OVAL code. I have a WIP pushed to confirm I'm going in the right direction here, this time. Still need to add the vulns repo ETL (for GitHub Actions) but that should be the easier part.

erd

iansltx commented 3 weeks ago

As the RC for 4.56.0 has already been cut and the work here is nontrivial, moving this to 4.57.0; this will require a server upgrade since we're shipping the goval-dictionary sqlite DB to the client and querying on it there.

I have a working (albeit ugly) analysis step built, following a simplified version of the oval pattern. Verified by pulling an older tag of AL2 from Docker and vuln-scanning that, then running yum -y update and confirming that the vulns went away.

Next step is to get the goval-dictionary outputs generated in the vulnerabilities repo workflow; I'm testing that now. Once I confirm that that works with AL2 alone, I'll add 1, 2022, and 2023 in as the extra effort is minimal.

Once that's done, the final functional piece (ignoring code cleanup) is downloading/extracting the correct sqlite DBs. Might get to that tonight.

iansltx commented 3 weeks ago

Got the build working. https://github.com/iansltx/vulnerabilities/actions/runs/10501203506 -> https://github.com/iansltx/vulnerabilities/releases/tag/cve-202408220322

iansltx commented 3 weeks ago

PR open for vulnerabilities repo workflow changes: https://github.com/fleetdm/vulnerabilities/pull/14

Future work there would be parellelizing NVD and goval-dictionary into separate jobs and splitting out release into a job that depended on both, to get the ~minute of build time back (more if we add more goval-dictionary pulls).

TODO:

iansltx commented 3 weeks ago

Switched milestone back here due to the window to cherry-pick features/feature-ish things into 4.56.0 being lengthened.

fleet-release commented 6 days ago

Amazon Linux host, False positive resolved, peace. Trust in Fleet restored.

qwerty1q2w commented 4 days ago

Thanks to all! You are really fast. Do I have an opportunity to clean all info in Software section? I upgraded my instance and have the last version - 4.56.0 but still have old vulns. image

iansltx commented 3 days ago

@qwerty1q2w Just to make sure, you've confirmed that vulnerabilities have been scanned since the 4.56.0 upgrade? Assuming that's the case, I've added #21947 to track cleanup of the false positives for hosts that were already enrolled/scanned at the time of an upgrade.

qwerty1q2w commented 3 days ago

@iansltx I updated FleetDM on September 9, 2024, at 10:41 GMT. image image

    env_file: ".env"
    image: fleetdm/fleet:v${FLEET_VERSION}
    privileged: false
    user: "100:101"
    command: "fleet serve"
    environment: &fleet_environment
      - USER=fleet
      - FLEET_MYSQL_ADDRESS=mysql:3306
      - FLEET_MYSQL_DATABASE=${FLEET_MYSQL_DATABASE}
      - FLEET_MYSQL_USERNAME=${FLEET_MYSQL_USERNAME}
      - FLEET_MYSQL_PASSWORD=${FLEET_MYSQL_PASSWORD}
      - FLEET_REDIS_ADDRESS=redis:6379
      - FLEET_SERVER_ADDRESS=0.0.0.0:8080
      - FLEET_SERVER_TLS=false
      - FLEETDM_JWT_KEY=${FLEET_JWT_KEY}
      - FLEET_OSQUERY_LABEL_UPDATE_INTERVAL=1h
      - FLEET_OSQUERY_DETAIL_UPDATE_INTERVAL=4h
      - FLEET_LOGGING_JSON=true
      - FLEET_OSQUERY_STATUS_LOG_PLUGIN=filesystem
      - FLEET_OSQUERY_RESULT_LOG_PLUGIN=filesystem
      - FLEET_FILESYSTEM_STATUS_LOG_FILE=/logs/osqueryd.status.log
      - FLEET_FILESYSTEM_RESULT_LOG_FILE=/logs/osqueryd.results.log
      - FLEET_FILESYSTEM_ENABLE_LOG_ROTATION=true
      - FLEET_VULNERABILITIES_DATABASES_PATH=/home/fleet/fleet_data
      - FLEET_VULNERABILITIES_PERIODICITY=1h
      - FLEET_OSQUERY_POLICY_UPDATE_INTERVAL=1h
    container_name: fleet-webgui
    restart: "always"
    volumes:
      - /srv/fleet/fleetlogs:/logs
      # cve data
      - ./fleet_data:/home/fleet/fleet_data
    networks:
      - fleet-backend

Do you need anything else? to make sure it's not just my problem with my instance