anchore / vunnel

Tool for collecting vulnerability data from various sources (used to build the grype database)
Apache License 2.0
74 stars 25 forks source link

Avoid scraping HTML in the amazon provider #8

Open wagoodman opened 1 year ago

wagoodman commented 1 year ago

Today the amazon provider ported from enterprise scrapes the posted HTML from https://alas.aws.amazon.com/ . However, this can be improved:

# get the release versions...
$ curl -O https://al2022-repos-us-west-2-9761ab97.s3.dualstack.us-west-2.amazonaws.com/core/releasemd.xml

# from this file you have versions like "2022.0.20221207"... use this to look up the mirror list
$ curl -O https://al2022-repos-us-east-1-9761ab97.s3.dualstack.us-east-1.amazonaws.com/core/mirrors/2022.0.20221207/x86_64/mirror.list

# the mirror list contains the URL to get the "repomd" index
$ curl -O https://al2022-repos-us-east-1-9761ab97.s3.dualstack.us-east-1.amazonaws.com/core/guids/581859ea114d36f96a58435ad4169541fe3fccb88e0130c85b3ed542a34171a2/x86_64/repodata/repomd.xml

# that index contains the checksums and location for "updateinfo" (contains vulnerability info)
$ curl -O https://al2022-repos-us-east-1-9761ab97.s3.dualstack.us-east-1.amazonaws.com/core/guids/581859ea114d36f96a58435ad4169541fe3fccb88e0130c85b3ed542a34171a2/x86_64/repodata/updateinfo.xml.gz

$ gunzip updateinfo.xml.gz
$ head updateinfo.xml
<?xml version="1.0" ?>
<updates><update status="final" version="1.4" author="linux-security@amazon.com" type="security" from="linux-security@amazon.com"><id>ALAS2022-2021-001</id><title>Amazon Linux 2022 - ALAS2022-2021-001: Medium priority package update for vim</title><issued date="2021-10-26 02:25" /><updated date="2021-10-27 00:24" /><severity>Medium</severity><description>Package updates are available for Amazon Linux 2022 that fix the following vulnerabilities:
CVE-2021-3875:
        There's an out-of-bounds read flaw in Vim's ex_docmd.c. An attacker who is capable of tricking a user into opening a specially crafted file could trigger an out-of-bounds read on a memmove operation, potentially causing an impact to application availability.
2014661: CVE-2021-3875 vim: heap-based buffer overflow
...
TimBrown1611 commented 1 month ago

I think some of the information doesn't exist beside the html page (like description). I suggest checking it before changing the method vunnel extract the data.