CVEProject / cve-schema

This repository is used for the development of the CVE JSON record format. Releases of the CVE JSON record format will also be published here. This repository is managed by the CVE Quality Working Group.
Creative Commons Zero v1.0 Universal
249 stars 138 forks source link

[Question] Clarification of the `source` attribute #48

Open mprpic opened 3 years ago

mprpic commented 3 years ago

Both the CNA and ADP containers include a source attribute that is defined as:

"source": {
    "type": "object",
    "description": "This is the source information (who discovered it, who
        researched it, etc.) and optionally a chain of CNA information (e.g.
        the originating CNA and subsequent parent CNAs who have processed it
        before it arrives at the MITRE root).\n Must contain: IF this is in the
        root level it MUST contain a CNA_chain entry, IF this source entry is
        NOT in the root (e.g. it is part of a vendor statement) then it must
        contain at least one type of data entry.",
    "minProperties": 1
},

What is the use case for this object? Can we get an example of its intended values? Vulnogram seems to use it to generate:

"source": {
    "advisory": "<CNA specific bug tracking IDs>",
    "defect": [<CNA specific advisory IDs (Optional)>],
    "discovery": "<some value>"
}

but none of that is defined in the schema and the values seem fairly arbitrary (assuming they will remain the same for 5.0).

chandanbn commented 3 years ago

Having fields to store CNA's defect IDs (bugzilla, jira), advisory IDs for direct reference/retrieval (instead of iterating through references) is useful for internal automation or cross referencing. We use Vulnogram to take a JIRA id as input and produce a CVE JSON draft as an output, so the JIRA id that was used for creating the first cut JSON is stored in this field. It is useful to our customers if they need additional support on the CVE and they/support can look up the JIRA/bugzilla id for additional details.

Discovery is a field to encode the source of vulnerability discovery. A consumer may consider a CVE discovered or encountered during normal usage more important than a CVE discovered under special laboratory conditions.

Vulnogram has four values: INTERNAL: this vulnerability was found by the CNA's internal research. EXTERNAL: this vulnerability was found during research external to a CNA. USER: This vulnerability was discovered during product use. UNKNOWN: Source of discovery is not defined or is unknown.

Some CNA's have also expressed the need for a fifth value: UPSTREAM: This vulnerability was found by an upstream vendor (who likely is not a CNA or did not assign a CVE).

Instead of discovery:value pairs we can perhaps define four new CVE tags "cna-internal-discovery", "cna-external-discovery", "found-during-use", and "upsteam-problem"?

jwhitmore-mitre commented 3 years ago

Source was left as an unstructured object as a carryover from the CVE 4.0. It's been an unstructured object as we have not had agreement within the community about what a source could be, nor a definitive list of value to make it an enumeration.

Different publishers have historically used different structures within the property to describe how the vulnerability was discovered and reported.

mprpic commented 3 years ago

If source is supposed to be a free-for-all container to include arbitrary key-value data, then let's at least remove the current description that mentions CNA chains (which I still have no idea what that means).

It also doesn't feel very tidy to have an unstructured object in an official spec. Isn't that what x_ properties are for?

chandanbn commented 3 years ago

submitted #60 for this

ccoffin commented 1 week ago

Discussed this in the September 12, 2024 QWG Meeting. Maybe just use discovery field with a subset of the current Vulnogram options. NOT upstream as this doesn't work well in this context. Mine the List and determine how it is used currently. Maybe abandon source object and start over with needed fields.

See https://github.com/CVEProject/cve-schema/issues/317

jayjacobs commented 4 days ago

The following are observations from the CVE data as of Sep 18, 2024.

Top Level Fields

  field        cve_count
1 discovery        39262
2 advisory         14905
3 defect            5032
4 lang               949
5 value              949
6 defects            360
7 found_during         1

Discovery

There seems to be some confusion about what goes in this field, but the current values break down like this:

EXTERNAL            :   17715
UNKNOWN             :   16837
INTERNAL            :   4064
USER                :   600
Will Dormann of CERT:   12
                    :   9
Internal            :   5
Discovery statement :   5
Brett Casper [/](https://file+.vscode-resource.vscode-cdn.net/) Wisco:  3
external            :   2
Neil Graves, Jorian :   2
UPSTREAM            :   2
Dr. Florian Hauser, :   1
CUSTOMER            :   1
Toronto-Dominion Ban:   1
Niv Levy            :   1
ING Bank N.V.       :   1
External            :   1

Advisory

https://www.whitesourcesoftware.com/vuln:   64
cisco-sa-rv-overflow-WUnUgv4U           :   61
cisco-sa-sb-rv-rce-overflow-ygHByAK     :   35
cisco-sa-rv-overflow-ghZP68yj           :   30
ICSA-22-081-01                          :   29
AMD-SB-1000                             :   27
AMD-SB-1032                             :   27
https://www.mend.io/vulnerability-databa:   26
AMD-SB-4002, AMD-SB-3002, AMD-SB-5001   :   21
AMD-SB-1021                             :   20
AMD-SB-1027                             :   19
https://us-cert.cisa.gov/ics/advisories/:   17
cisco-sa-smb-mult-vuln-KA9PK6D          :   15
cisco-sa-fmc-xss-LATZYzxs               :   15
cisco-sa-rv-stored-xss-vqz7gC8W         :   15
VDE-2023-019                            :   15
ICSA-22-298-06                          :   14
VDE-2023-018                            :   14
cisco-sa-20181003-webex-rce             :   13
ODOO-SA-2020-12-02                      :   13
cisco-sa-20191016-spa-rce               :   13
ICSA-21-280-05                          :   13
cisco-sa-csm-mult-xss-7hmOKQTt          :   13
VDE-2024-011                            :   13
JSA10918                                :   12
...
TVN-202409018                           :   1
TVN-202409019                           :   1
TVN-202409020                           :   1
TVN-202409026                           :   1

Defect

this is split between objects (dict) and arrays and strings as outline in #317

"lang" and "value"

The CVEs with the "lang" and "value" combinations appear to treat this as the "credits" portion, for example:

{"lang": "en", "value": "Mat Powell of Trend Micro Zero Day Initiative"}

appears in 181 CVEs.

jayjacobs commented 4 days ago

one last addition, If I look for any value in the "source" section, out of all the CVEs published in a 90 day rolling window, we are somewhere around 40% of CVEs including something in this section of the data: image