CVEProject / cvelistV5

CVE cache of the official CVE List in CVE JSON 5 format
598 stars 133 forks source link

Vendors, Products and Versions are totally messed up #6

Open cookiengineer opened 2 years ago

cookiengineer commented 2 years ago

Hey there,

in the cvelist, all vendors and products and their versions are totally messed up.

First off, there seem to be more than one notation for the meaning of "n/a" (aka null). So far I've identified these notations: n/a, * n/a *, *** n/a ***, NONE, None, none, no, null, [UNKNOWN], [Unknown], Unknown.

Additionally, all vendors and products are messed up. Sometimes there's the product field containing the actual versions that are affected in a comma separated list. Sometimes the Vendor is redundantly marked e.g. as Example, Inc and Example Corporation and Example. Siemens alone has more than 10 different notations.

The versions themselves are a whole other story, because most of them are also totally invalid. Even when there's a lessThan field set, sometimes the value of it is set to None. It gets even more ridiculous when the same CVE has two different affected versions which logically contradict each other.

chandanbn commented 11 months ago

These are good points to provide guidance on in the best practices doc https://github.com/CVEProject/cve-schema/issues/241

sei-vsarvepalli commented 11 months ago

Hello @cookiengineer

Do you have the scripts/tools you used for this analysis that you are wiling to share? Are you able to provide perhaps the list of big offending CNA's that provide such contradictory data? This may help us write better guidance and eventually pursue more programatic validations to keep driving quality into the collected CVE records.

Thanks Vijay

itipsai commented 5 months ago

Hello.

I have the same issue. For example CVE-2021-41617 in old format file https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2021.json.zip has detailed list of affected products:

  "description" : {
    "description_data" : [ {
      "lang" : "en",
      "value" : "sshd in OpenSSH 6.2 through 8.x before 8.8, when certain non-default configurations are used, allows privilege escalation because supplemental groups are not initialized as expected. Helper programs for AuthorizedKeysCommand and AuthorizedPrincipalsCommand may run with privileges associated with group memberships of the sshd process, if the configuration specifies running the command as a different user."
    } ]
  }
},
"configurations" : {
  "CVE_data_version" : "4.0",
  "nodes" : [ {
    "operator" : "OR",
    "children" : [ ],
    "cpe_match" : [ {
      "vulnerable" : true,
      "cpe23Uri" : "cpe:2.3:a:openbsd:openssh:*:*:*:*:*:*:*:*",
      "versionStartIncluding" : "6.2",
      "versionEndExcluding" : "8.8",
      "cpe_name" : [ ]
    } ]
  }, {
    "operator" : "OR",
    "children" : [ ],
    "cpe_match" : [ {
      "vulnerable" : true,
      "cpe23Uri" : "cpe:2.3:o:fedoraproject:fedora:33:*:*:*:*:*:*:*",
      "cpe_name" : [ ]
    }, {
      "vulnerable" : true,
      "cpe23Uri" : "cpe:2.3:o:fedoraproject:fedora:34:*:*:*:*:*:*:*",
      "cpe_name" : [ ]
    }, {
      "vulnerable" : true,
      "cpe23Uri" : "cpe:2.3:o:fedoraproject:fedora:35:*:*:*:*:*:*:*",
      "cpe_name" : [ ]
    } ]
  }, {

In new version of CVE, which is at https://github.com/CVEProject/cvelistV5/blob/main/cves/2021/41xxx/CVE-2021-41617.json, cpe information as well as list of affected products is empty.

    },
        "descriptions": [
            {
                "lang": "en",
                "value": "sshd in OpenSSH 6.2 through 8.x before 8.8, when certain non-default configurations are used, allows privilege escalation because supplemental groups are not initialized as expected. Helper programs for AuthorizedKeysCommand and AuthorizedPrincipalsCommand may run with privileges associated with group memberships of the sshd process, if the configuration specifies running the command as a different user."
            }
        ],
        "affected": [
            {
                "vendor": "n/a",
                "product": "n/a",
                "versions": [
                    {
                        "version": "n/a",
                        "status": "affected"
                    }
                ]
            }
        ],

Could you please clarify whether you plan to transfer information about affected products to a new format and if so, when. If the transfer is not planned, do I understand correctly that this information will be lost for the new format?

xiconfjs commented 3 months ago

Same issue here with many CVEs where JSON 4.0 has way more detailed information, just one example:

JSON 4.0:

{
    "cve" : {
      "data_type" : "CVE",
      "data_format" : "MITRE",
      "data_version" : "4.0",
      "CVE_data_meta" : {
        "ID" : "CVE-2024-28535",
        "ASSIGNER" : "cve@mitre.org"
      },
      "problemtype" : {
        "problemtype_data" : [ {
          "description" : [ {
            "lang" : "en",
            "value" : "CWE-787"
          } ]
        } ]
      },
      "references" : {
        "reference_data" : [ {
          "url" : "https://github.com/abcdefg-png/IoT-vulnerable/blob/main/Tenda/AC18/fromAddressNat_mitInterface.md", 
          "name" : "https://github.com/abcdefg-png/IoT-vulnerable/blob/main/Tenda/AC18/fromAddressNat_mitInterface.md",
          "refsource" : "",
          "tags" : [ "Exploit", "Third Party Advisory" ]
        } ]
      },
      "description" : {
        "description_data" : [ {
          "lang" : "en",
          "value" : "Tenda AC18 V15.03.05.05 has a stack overflow vulnerability in the mitInterface parameter of fromAddressNat function."
        } ]
      }
    },
    "configurations" : { 
      "CVE_data_version" : "4.0",
      "nodes" : [ {
        "operator" : "OR",
        "children" : [ ],
        "cpe_match" : [ {
          "vulnerable" : true,
          "cpe23Uri" : "cpe:2.3:o:tenda:ac18_firmware:15.03.05.05:*:*:*:*:*:*:*",
          "cpe_name" : [ ]
        } ]
      } ]
    },
    "impact" : {
      "baseMetricV3" : {
        "cvssV3" : {
          "version" : "3.1",
          "vectorString" : "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
          "attackVector" : "NETWORK",
          "attackComplexity" : "LOW",
          "privilegesRequired" : "NONE",
          "userInteraction" : "NONE",
          "scope" : "UNCHANGED",
          "confidentialityImpact" : "HIGH",
          "integrityImpact" : "HIGH",
          "availabilityImpact" : "HIGH",
          "baseScore" : 9.8,
          "baseSeverity" : "CRITICAL"
        },
        "exploitabilityScore" : 3.9,
        "impactScore" : 5.9
      }
    },
    "publishedDate" : "2024-03-12T13:15Z",
    "lastModifiedDate" : "2024-03-21T20:58Z"
  }

JSON 5.0:

{
    "dataType": "CVE_RECORD",
    "dataVersion": "5.0",
    "cveMetadata": {
        "state": "PUBLISHED",
        "cveId": "CVE-2024-28553",
        "assignerOrgId": "8254265b-2729-46b6-b9e3-3dfca2d5bfca",
        "assignerShortName": "mitre",
        "dateUpdated": "2024-03-12T12:47:50.996910",
        "dateReserved": "2024-03-08T00:00:00",
        "datePublished": "2024-03-12T00:00:00"
    },
    "containers": {
        "cna": {
            "providerMetadata": {
                "orgId": "8254265b-2729-46b6-b9e3-3dfca2d5bfca",
                "shortName": "mitre",
                "dateUpdated": "2024-03-12T12:47:50.996910"
            },
            "descriptions": [
                {
                    "lang": "en",
                    "value": "Tenda AC18 V15.03.05.05 has a stack overflow vulnerability in the entrys parameter fromAddressNat function."
                }
            ],
            "affected": [
                {
                    "vendor": "n/a",
                    "product": "n/a",
                    "versions": [
                        {
                            "version": "n/a",
                            "status": "affected"
                        }
                    ]
                }
            ],
            "references": [
                {
                    "url": "https://github.com/abcdefg-png/IoT-vulnerable/blob/main/Tenda/AC18/fromAddressNat_entrys.md"
                }
            ],
            "problemTypes": [
                {
                    "descriptions": [
                        {
                            "type": "text",
                            "lang": "en",
                            "description": "n/a"
                        }
                    ]
                }
            ]
        }
    }
}
sei-vsarvepalli commented 3 months ago

Redhat PSIRT team has been also helping us with this work and have created a tool cvelint that is helpful in https://github.com/mprpic/cvelint/tree/main with produces a cvelint-action repository to capture these errors using GitHub Actions. https://github.com/jgamblin/cvelint-action

Please explore these tools that also is trying to help improve quality in the CVE data specifically in fields such as "affected products" as mentioned here.

xiconfjs commented 3 months ago

Redhat PSIRT team has been also helping us with this work and have created a tool cvelint that is helpful in https://github.com/mprpic/cvelint/tree/main with produces a cvelint-action repository to capture these errors using GitHub Actions. https://github.com/jgamblin/cvelint-action

Please explore these tools that also is trying to help improve quality in the CVE data specifically in fields such as "affected products" as mentioned here.

Ok, for the CNA "mitre" I ran the tool for 2024s CVEs and got 867 cases with Invalid version string: "n/a" (at "containers.cna.affected.#.versions.#.version"). What to do now? Is there a way to report these? Because we have to fix these before the official [1] [2] termination of the legacy list.

[1] "(...) They will no longer be updated after June 30, 2024." [2] https://www.cve.org/Media/News/item/blog/2024/03/12/Phase-3-Deprecation-of-Legacy-Downloads-Underway

sei-vsarvepalli commented 3 months ago

Hello @xiconfjs

The termination of legacy list is not relevant for this issue, as the API based feed will also provide the same data anyway. Currently the way to use cvelint project is to filter out data that is incomplete when consuming CVE data using tools. As far as the longer-term fix (of updating these records and forcing better input schema validation), there is some work going on the CVE Quality Working Group (QWG) https://www.cve.org/ProgramOrganization/WorkingGroups to try to move the needle forward in better quality of CVE data. The documents we are writing in the various ways also are helping CNA's be more responsible for providing quality data in their CVE records.

Hope that makes sense!

xiconfjs commented 3 months ago

(...) as the API based feed will also provide the same data anyway. (...) there is some work going on the CVE Quality Working Group (QWG) (...) to try to move the needle forward in better quality of CVE data. (...)

Hi @sei-vsarvepalli,

thank you for trying to shine some light into this situation. But as presented before (https://github.com/CVEProject/cvelistV5/issues/6#issuecomment-2025558408) there are differences in the pure data between the legacy data source (https://www.cve.org/Downloads#legacy-format) which is provided in JSON 4.0 syntax and the new data source (https://github.com/CVEProject/cvelistV5) provided in JSON 5.0. And as you can see this example CVE (CVE-2024-28553) was published on 2024-03-12, so no old data which didn't get migrated correctly. How does such a discrepancy come about?

Thanks @xiconfjs

xiconfjs commented 3 months ago

(...) as the API based feed will also provide the same data anyway. (...) there is some work going on the CVE Quality Working Group (QWG) (...) to try to move the needle forward in better quality of CVE data. (...)

Hi @sei-vsarvepalli,

thank you for trying to shine some light into this situation. But as presented before (#6 (comment)) there are differences in the pure data between the legacy data source (https://www.cve.org/Downloads#legacy-format) which is provided in JSON 4.0 syntax and the new data source (https://github.com/CVEProject/cvelistV5) provided in JSON 5.0. And as you can see this example CVE (CVE-2024-28553) was published on 2024-03-12, so no old data which didn't get migrated correctly. How does such a discrepancy come about?

Thanks @xiconfjs

I have to clarify something...the "old" JSON data source I was referring to wasn't sources from cve.org @ mitre but from nvd @ nist. So, my question would have to be rephrased: Why is does the nvd higher quality data? Or isn't NVD sharing with cve.org?