Closed rsc closed 3 years ago
+1 for GIT commit IDs as they help locate the vulnerable instances of code more precisely than versions.
Software changes represent a tree (directed acyclic graph) structure. Each commit results in new software - a node in this tree. Each fork results in a new branch. Some nodes get labeled as versions.
For any given node and any given vulnerability:
It is possible that we have two more colors (but we can ignore them for now for simplicity)
The problem we have: given a three-colored tree, we need to encode/serialize the graph so it captures this information as accurately, with less ambiguities, and allows easily determine if any given version is affected.
For eg.,
2.8 → 2.9 ---→ 2.10 → 2.11 → 2.12
↳ 3.0 → 3.1 ----→ 3.2 → 3.3
↳ 4.0 → 4.1 → 4.2 → 4.3
Lets say 3.0 was branched off sometime before 2.10 was released, and 4.0 was branched off before 3.2 was released.
If the JSON was like:
"affects": {
"ranges": [
{"type": "SEMVER", "introduced": "2.10", "fixed": "4.3"},
]
}
Though it is easy to compute that 4.0 thru 4.2 are vulnerable, how do you determine the vulnerability status of 3.0 thru 3.3 and 2.8 thru 2.12 (except 2.10)? Does semver capture the information about how one linear branch is related to another?
Extending on that above example, this can be expressed like so:
Assuming that:
"affects": {
"ranges": [
{"type": "ECOSYSTEM", "introduced": "2.9", "fixed": "2.10"},
{"type": "ECOSYSTEM", "introduced": "3.0", "fixed": "3.2"},
{"type": "ECOSYSTEM", "introduced": "4.0", "fixed": "4.3"},
]
}
These conditions are evaluated with OR. A version is affected if it falls into any of those ranges. Using this we should be able to describe any set of ranges unambiguously. Describing ranges this way also makes it more easily understandable by human users if they want to know which versions they upgrade to if they're impacted.
Proposal
Replace "versions"
in the current product object with these fields:
"affectedVersions": [{
"range": string, // "semver", "git", "other"; optional, missing means not a range
"version": string, // specific version, or start of range; required
"before": string, // range ends just before this version; required when range is present
"unspecified": bool, // true if vulnerability status is unspecified (as opposed to asserted vulnerable); optional (default false)
"repo": string, // for type git (repository holding code); optional
}],
"testedVersions": [{
"version": string, // specific version; required
"vulnerable": bool, // required
}],
"platforms": [string],
"references": [string],
Rationale
The discussion at the quality working group meeting brought up two important points:
Vendors may not want to commit to a specific “introduced” version. This is the reason for the ?, which I had tried to model as unsure. It's not really “unsure” so much as “undisclosed” or “unspecified.” (“We don't want to say.”)
Security researchers can only report results for specific versions that they have tested. In that context it is useful to say things like “1.1.0 is vulnerable; 1.5.0 is not.”
At the meeting, we spent a little while trying to figure out how to get all that into a single version object. Afterward, based on additional thought and discussion, @ochang and I propose that it would make sense to have two separate lists, with different uses and consumers.
First, there is the “affected versions” list, which is ideally an algorithmically precise description of an answer to the question “does version X contain this vulnerability?” Perhaps there are three possible answers — yes, no, perhaps — but it's still clear what the answer is for any given version X.
This list is consumed by programs that users run such as SBOM-based vulnerability scanners. In this list, if a version is not listed, the implication is “no, it is not affected.” That is, there is no need to enumerate all the unaffected versions.
"affectedVersions": [{
"range": string, // "semver", "git", "other"; optional, missing means not a range
"version": string, // specific version, or start of range; required
"before": string, // range ends just before this version; required when range is present
"unspecified": bool, // true if vulnerability status is unspecified (as opposed to asserted vulnerable); optional (default false)
"repo": string, // for type git (repository holding code); optional
}],
A version object {"version": V}
describes the single version V,
for use when vendors need to enumerate the exact list of affected versions one at a time,
such as when their version numbering isn't one of the known computable types.
Otherwise, a version object {"range": ..., "version": V, "before": W}
describes the half-open range of versions v such that V ≤ v and v < W, according to the precise ordering defined by the "range"
setting. (Note that W is not included.)
For a range, both "version"
and "before"
are required, but using the value "*"
for "version"
removes the lower bound, and using "*"
for before removes the upper bound:
{"range": "semver", "version": "*", "before": "1.2.3"}
describes all versions before 1.2.3.{"range": "semver", "version": "1.2.3", "before": "*"}
describes all versions 1.2.3 or later.{"range": "semver", "version": "*", "before": "*"}
describes all versions ever.Most projects will ignore unspecified, in which case listed versions are affected and unlisted ones are taken to be unaffected. When a vendor wants to cast doubt on a version without specifically identifying it as vulnerable, they can use
{"version": "1.2", "unspecified": true}
or
{"range": "semver", "version": "1.2.3", "before": "1.4.5", "unspecified": true}
Again, versions not explicitly listed are implicitly unaffected. There is no explicit "unaffected" status to cause confusion with the implicit status of not being listed.
Second, there is a separate “tested versions” list, for recording the results of security research.
This list is consumed only by security researchers and vendors, not programs. In this list, if a version is not listed, the implication is “there are no test results for this version.” That's a very different statement than the first list.
"testedVersions": [{
"version": string, // specific version; required
"vulnerable": bool, // required
}]
Here the version is explicitly deemed either vulnerable or not — there is no unspecified. Versions that have not been tested, or for which tests were inconclusive, are omitted from the list.
Another possible use of this second list would be automated systems that run proof-of-concept or other tests against specific versions. Such systems could record their results in the "testedVersions"
list as a way to confirm (or refute) the claimed "affectedVersions"
.
These two separate lists seem to separate out two distinct use cases nicely, making it possible to serve both well with separate mechanisms, where before it seemed impossible to serve them well with a single mechanism.
At the working group meeting it sounded like there was consensus to move "platforms"
out of the version list and into the outer product object. I have moved "references"
out as well, since it seemed even more likely to be version-independent than "platforms"
.
[Edit: Streamlined the two objects a bit.]
I noted above that I moved platforms out, as discussed, and also references, since the same rationales seemed to apply. I think maybe we should move repo out as well. The code will come from a single repo that will not vary from version to version. So repo could go into the outer product object too.
testedVersions seems to be ok for making "not affected" assertions per single version. How do we encode a range of not affected?
Consider an example: a vuln in introduced in 2.12 fixed in 2.14, but due to some reason (like a mistake in resolving code conflicts) 2.16 and 2.17 are vulnerable again and it then gets fixed in 2.18. (Such things are rare but do happen).
.. 1.0 → 1.1 → 1.2 → 1.3 ... ... 2.10 → 2.11 → 2.12 → 2.13 → 2.14 → 2.15 → 2.16 → 2.17 → 2.18
( *= affected)
{"range": "semver", "version": "2.12", "before": "2.14"}
{"range": "semver", "version": "2.16", "before": "2.18"}
How do we affirmatively say 2.14 and 2.15 are not affected as a range? let's say 1.x was an unmaintained branch that was never evaluated. Since the bug was introduced in 2.12 the CNA wants to assert 1.x is unlikely to be affected.
Instead of testedVersions and unspecified can we do this with an optional rangeAffected ['affected' (default), 'unaffected', 'unspecified', 'likely', 'unlikely']?
{"range": "semver", "version": "1.0", "rangeAffected": "unlikely"}
{"range": "semver", "version": "2.12", "before": "2.14"}
{"range": "semver", "version": "2.14", "before": "2.16", "rangeAffected": "unaffected"}
{"range": "semver", "version": "2.16", "before": "2.18"}
(a consumer can consider likely to be same as affected for vulnerability management)
Another suggestion is to add a new value for range : "patch" - for products that use patching (like CVE-2017-4905 )
product: "ESXi"
versions:
{ "range": "patch", "version": "6.5", "before": "ESXi650-201703410-SG"}
{ "range": "patch", "version": "6.0 U3", "before": "ESXi600-201703401-SG"}
{ "range": "patch", "version": "6.0 U2", "before": "ESXi600-201703403-SG"}
...
+1 for a rangeAffected type quantifier. This would allow the schema to simplify affectedVersions and testedVersions into a common versions property. This ensures parity in affected and unaffected version expression without duplication of the field.
This change would ultimately open up the version property for more additions in subsequent minor versions if desired, such as likely and unlikely, without strictly breaking compatibility.
Hi @rsc . Thanks for providing that proposal!
A few things that come to mind if we were to adopt this, from a naive perspective (on purpose):
1.) How would I specify a range if the vuln isn't fixed yet and/or plans to fix are not known? Currently, I'd be required to provide a version
and a before
, but what if I don't know what before
is since it doesn't yet exist?
2.) The unspecified
boolean is still a bit muddy to me. This doesn't necessarily mean that I feel that it doesn't belong, but I think if we go this route, it'll be very important to document exactly what the semantics are behind it and provide some example cases as well. Do you have some specific scenarios in mind with how this would be used? If I'm a vendor, and I say "We're registering CVE-0000-00000 for project X, versions 1.0 before 1.9 are affected" and "versions 2.0 before 2.7 are affected, unspecified," what am I saying and how is it useful?
3.) I understand the use-case for testedVersions
as described and in my experience, it is very common for researchers to identify that they've tested just 1 version, get e.g. an ASAN dump and then make an upstream bug on an open source project as one example. We as a vendor don't immediately know all of which versions are affected from that info alone and the researcher is not making any assertions outside of the reported version.
I guess my only question is (and perhaps it's just a question of naming): Is it semantically sound to separate testedVersions
from affectedVersions
? Other than testing, how are claims for affectedVersions
being made? Are we implying that affectedVersions
have not been tested if they appear in one list and not the other? Again, I get the underlying use-case, but that's what I understand when I consume both lists. Is affectedVersions
only to be used exclusively for reports from security researchers?
Put another way, if I'm a security engineer at a vendor, and I'm assigning a CVE for package foo
; I've tested version 1.5 and found it to be vulnerable by actually reproducing the flaw, but I'm told by upstream that it affects all versions before 1.5 as well, would I set e.g. {"range": "semver", "version": 1.0
, "before": 1.6}
, in affectedVersions
and testedVersions
to {"version": "1.5", "vulnerable": true}
because I actually tested/reproduced on 1.5 but I'm told it affects those prior versions as well, which I have not tested for whatever reason (such as not supported, not shipped, etc...?) What about platforms here? As in, researcher tested on platform X but vendor reports affected versions on platform Y? Just trying to confirm that I understand the usage.
Thanks again for doing this, and I'd re-iterate the importance of documenting the implications you mentioned such as:
Versions that have not been tested, or for which tests were inconclusive, are omitted from the list.
In this list, if a version is not listed, the implication is “no, it is not affected.” That is, there is no need to enumerate all the unaffected versions.
etc... Some areas of the schema are widely up to user interpretation for usage, and others it seems beneficial for the community to have some conformity on, so just want to ensure we make those areas well known, as this data is only useful when interpreted properly.
Lastly, I think you meant to tag @oliverchang ! :)
@cganas thanks for the feedback.
+1 for a rangeAffected type quantifier. This would allow the schema to simplify affectedVersions and testedVersions into a common versions property. This ensures parity in affected and unaffected version expression without duplication of the field.
A common versions property has the problem of not having a clear meaning for versions not explicitly listed. The two different use cases have two different natural semantics:
Merging these different natural semantics into a single field makes the meaning of unlisted contradictory and unclear.
It seems like a significant step forward in clarity to separate the two uses.
@tcullum-rh thanks for the feedback
1) The idea was to use "before": "*"
to explicitly indicate "there is no upper end to this range". And then if a fix was issued later you'd update the record, of course.
2) I am not at all attached to "unspecified" as the name for this middle-ground, nor do I really claim to understand the use case. What I thought I heard at the meeting was that a vendor wants a way to tell users "act as though these are vulnerable" without actually claiming (or admitting?) that they are in fact vulnerable. Suggestions welcome.
3) I agree with what happens in your scenario. Generally, I think the answers come down to what is consuming these fields. affectedVersions is for programs reporting vulnerabilities to users, doing automated ugprades, etc. In that case, the goal is to list the ranges for which action should be taken (along with perhaps the qualifier on the confidence that action really is needed, from 2). testedVersions is for researchers to document what they've tested and doesn't feed into the same automated systems. A security researcher at a vendor is probably focused on the first use case, in which it's probably enough to just list affectedVersions and not bother with testedVersions at all, even though some testing has been done of course.
Thanks for pointing out the username snafu. Sorry @oliverchang!
@chandanbn thanks for the feedback.
testedVersions seems to be ok for making "not affected" assertions per single version. How do we encode a range of not affected?
I think it would be fine to encode ranges there, although a researcher without access to the source code repo may have difficulty making such broad assertions. We could do it the same way as in the affectedVersions list.
Instead of testedVersions and unspecified can we do this with an optional rangeAffected ['affected' (default), 'unaffected', 'unspecified', 'likely', 'unlikely']?
This runs back into the issue I was trying to solve with the split, which I mentioned in https://github.com/CVEProject/cve-schema/issues/87#issuecomment-894647308 as well.
Specifically, once there is an explicit status that has the same meaning as not listing a version at all, then it becomes unclear whether you are supposed to list things explicitly or not. Consider:
affectedVersions: [{"version": "1.2.3"}]
vs
testedVersions: [{"version": "1.2.3", "vulnerable": true}]
What does each say about 1.4.5?
The idea was that the affectedVersions line says (by not listing it) that 1.4.5 is unaffected, and similarly the testedVersions line says (by not listing it) that 1.4.5 is untested, which is a different statement.
If there is a single field, then unlisted can have only one meaning.
If unlisted means unaffected, then the security researcher has to write something like:
versions: [
{"version": "*", "before": "1.2.3", "status": "untested"},
{"version": "1.2.3", "status": "affected"},
{"version": "1.2.4", "before": "*", "status": "untested"},
]
when all they really want to say is "there's a vulnerability in 1.2.3".
If unlisted means status unknown, then the vendor issuing instructions to users needs to write
versions: [
{"version": "*", "before": "1.2.3", "status": "unaffected"},
{"version": "1.2.3", "status": "affected"},
{"version": "1.2.4", "before": "*", "status": "unaffected"},
]
when all they really want to say is "only 1.2.3 is affected".
It seems like inevitably people are going to write the 1-line version when they "should" be writing the 3-line versions.
The two different fields allow two different defaults, which should make the authoring of these more natural and less prone to error, as well as clearer in meaning.
IMHO we are solving two problems here:
When software versioning is linear, listing the affected range is sufficient. A tool should interpret versions outside of the range as 'unaffected'. This proposal is perfectly adequate and intuitive to use.
The difficulty comes when software has multiple concurrently maintained branches (e.g., Linux, OpenSSL). Ranges that span multiple branches may not make sense. Often CVE assigner does not make statements about older branches, they may not be listed in a CVE, but are likely affected. Without this additional context (like EOL) a tool can misreport an older version as unaffected. That is dangerous because people may have a vulnerability they should care about, but tools may fail to warn them.
Take https://www.linuxkernelcves.com/cves/CVE-2021-3655
versionGroup: fixed version 4.14: 4.14.240 4.19: 4.19.198 5.10: 5.10.51 5.12: 5.12.18 5.13: 5.13.3 5.4: 5.4.133
Since 4.15.1 isn't listed there should a tool report it as unaffected?
My suggestion to solve the info capture problem:
To interpret the records:
At the minimum when there is only one range of affected versions, this is sufficient:
versions: [
{ version: '*', before : '5.14-rc1' }
]
When there are branches with multiple ranges, this should be sufficient:
versions: [
{ versionGroup: 4.14, start: 4.14.0, before: 4.14.240 }
{ versionGroup: 4.19, start: 4.19.0, before: 4.19.198 }
{ versionGroup: 5.10, start: 5.10.0, before: 5.10.51 }
]
A few optional entries reinforce the facts and would help tooling make accurate determinations.
{ start: 16.0.0, status: 'not-affected' }
How about:
versions: [
{
"range": [ semver, git, patch, other ] // optional, missing means not a range
"versionGroup": string // (optional) represents a version branch, group, or a major version (e.g. 10.0, 3.1.*) where these ranges are meaningful.
"version": string, // specific version, or start of range; required
"before": string, // range ends just before this version; required when range is present
"status": [ affected (default), unaffected, undefined, likely-affected, unlikely-affected] // optional, consider 'affected' if absent
}
]
My concern with versionGroup
as-is is that it relies on the consumer of such entries to know how to map version
to a versionGroup
. There could be many different ways to do so, depending on the versioning scheme or ecosystem.
We may want something like this instead to describe a "versionGroup" instead of just a "string".
{ versionGroup: { start: 4.14.0, before: 4.15.0 }, version: 4.14.0, before: 4.14.240 }
This does make the entries a bit difficult to read as a human if they're inline (because they are two entries in each), so perhaps it could be indirect by adding a new field to define versionGroups, and have the individual ranges reference that (as per your examples).
"versionGroups: {
"4.14": {
"start": "4.14.0",
"before": "4.15.0",
}
}
"versions": [ { versionGroup: 4.14, version: 4.14.0, before: 4.14.240 } ]
On interpreting these entries: I think different consumers will want some flexibility depending on risk / noise appetite, as it's ultimately up to the consumer how to deal with incomplete data.
For example, they could assume (or know) the data is high quality/complete and ignore versionGroup altogether, and assuming anything that's unlisted is strictly unaffected (rather than unspecified / unknown).
My understanding is grouping ranges by versionGroup
is that it creates some implicit "unspecified" ranges (i.e. any unspecified groups of versions are implied to be "unspecified"). So, a consumer could also do as you suggested: where unspecified (implicit or explicit) is assumed to be likely vulnerable.
Using this as an example again:
versions: [
{ versionGroup: 4.14, version: 4.14.0, before: 4.14.240 }
{ versionGroup: 4.19, version: 4.19.0, before: 4.19.198 }
{ versionGroup: 5.10, version: 5.10.0, before: 5.10.51 }
]
Testing 4.14.99
, this matches group 4.14, does not match any ranges there. This is unambiguously unaffected.
Testing 4.15.0
, No groups matched, which means 4.15.0 is unspecified. This is up to the consumer how to interpret it.
Testing 5.11.0
, No groups matched, so it's unspecified. but it's higher than any listed before
(i.e. "5.10.51"), so a consumer may interpret this as unaffected.
Ignoring versionGroup
completely also has the same affect as treating "unspecified" ranges as "unaffected".
Does my understanding seem correct?
In any case, this doesn't change the meaning of {version, before} within a versionGroup -- because a version that doesn't match any (non-unspecified) ranges within a group still unambiguously means "unaffected". So I don't know if it answers whether we need both affectedVersions
and testedVersions
for the reasons @rsc outlined in https://github.com/CVEProject/cve-schema/issues/87#issuecomment-894649276 ?
Chandan proposes " { versionGroup: 4.14, start: 4.14.0, before: 4.14.240 } " and this matches my experience handling vulnerability metadata for OpenSSL and various Apache projects (where they are not semver).
For OpenSSL we combined having a 'fixed version' (for a given major version) along with listing all the known affected versions indvidually: https://www.openssl.org/news/vulnerabilities.xml
<affects base="1.1.1" version="1.1.1e"/>
<affects base="1.1.1" version="1.1.1f"/>
<fixed base="1.1.1" version="1.1.1g" date="20200421">
<git hash="eb563247aef3e83dda7679c43f9649270462e5b1"/>
</fixed>
which would become " { versionGroup: 1.1.1, start: 1.1.1d, before: 1.1.1g } "
or
<affects base="1.1.1" version="1.1.1a"/>
<affects base="1.1.1" version="1.1.1b"/>
<affects base="1.1.1" version="1.1.1c"/>
<affects base="1.1.1" version="1.1.1d"/>
<affects base="1.0.2" version="1.0.2"/>
<affects base="1.0.2" version="1.0.2a"/>
<affects base="1.0.2" version="1.0.2b"/>
...
<affects base="1.0.2" version="1.0.2t"/>
<fixed base="1.1.1" version="1.1.1e" date="20191206">
<git hash="419102400a2811582a7a3d4a4e317d72e5ce0a8f"/>
</fixed>
<fixed base="1.0.2" version="1.0.2u" date="20191220">
<git hash="f1c5eea8a817075d31e43f5876993c6710238c98"/>
</fixed>
which would become " { versionGroup: 1.1.1, start: 1.1.1, before: 1.1.1e } , { versionGroup: 1.0.2, start: 1.0.2, before: 1.0.2u } , "
Problem 1: quite often the OSS project doesn't have resources to make sure we know "earliest affected version" (for example it might be too hard to determine what old things are affected particularly if things got refactored). So does the lack of 1.0.2 in that first example mean it's not vulnerable (which it does) or that we no longer look at how 1.0.2 is affected?
Problem 2: So if there is an old EOL branch it's quite likely the OSS project won't even look if that one was vulnerable. So how about the OpenSSL 0.9.8 version? As the upstream we don't tell you. But other consumers of OpenSSL who patched it after upstream stopped (like long life distro branches, Red Hat etc), probably did that work to figure out all the affected EOL versions too.
Second example which is similar, before I switched ASF httpd to JSON 4.0....
view-source:https://web.archive.org/web/20200416103646/http://httpd.apache.org/security/vulnerabilities-httpd.xml
<fixed base="2.4" version="2.4.27" date="20170711"/>
<fixed base="2.2" version="2.2.34" date="20170711"/>
...
<affects prod="httpd" version="2.4.1"/>
...
<affects prod="httpd" version="2.2.0"/>
So that would become " { versionGroup: 2.2, start: 2.2.0, before: 2.2.34 } , { versionGroup: 2.4, start: 2.4.1, before: 2.4.27 } , "
But for ASF when we hadn't verified but it looked plausible....
<maybeaffects prod="httpd" version="2.0.49"/>
(Although for the JSON format I just lazy converted those into 'affects')
(We also had the occasional "won't fix" where "2.2. is affected, we didn't fix it in 2.2" and the occasional "2.2. is affected, it's fixed by an available patch/svn head, but not in any released version")
Problem 3: Distro versions will vary. You could normally just say this is out of scope, but it's likely most of the users of say OpenSSL will be using a distro packaged version. And they backport security fixes. It's why at Red Hat we introduced OVAL for all our errata so you could map a given Red Hat RPM version of (Apache HTTP Server, OpenSSL, anything) to CVE.
Ignoring
versionGroup
completely also has the same affect as treating "unspecified" ranges as "unaffected".Does my understanding seem correct?
As you said if the data set is complete, we don't need versionGroup. A tool can easily say anything unlisted in unaffected. When the data is incomplete (and it will often be), telling consumers/tools to assume the unlisted is unaffected is dangerous.
Take CVE-2021-33909 for example: It was introduced by a commit 058504edd02667eef8fac9be27ab3ea74332e9b4 in Linux Kernel 3.16 It was fixed by commit 8cae8cd89f05f6de223d63e6d15e31c8ba9cf53b in a v5.14-rc branch.
Whoever requested the CVE at the time of assignment may have said it affected Linux Kernel from 3.16 to before 5.13.4. Which was likely that only that information was available at the time. That is sufficient to get a CVE - we should not be waiting for all the information to be available.
Now that vulnerability seems to have been fixed in each of the actively maintained Linux kernel branches - each fixed with a different commit id for eg.,
4.14 --> before: 3c07d1335d17ae0411101024de438dbc3734e992 4.19 --> before: 6de9f0bf7cacc772a618699f9ed5c9f6fca58a1d 5.13 --> before: 71de462034c69525a5049fbdf3903c5833cbce04
The entry in OSV seems to have picked only one affected range with a fix commit id for just one branch 4.14. So the list of versions listed as affected is not telling the whole truth. For eg., it does not list 5.13.3 as affected. If one were to take anything not listed as unaffected, then a tool consuming that data would wrongly (and dangerously) say 5.13.3 is unaffected which is not true here.
I believe we all agree:
Given the above:
The entry in OSV seems to have picked only one affected range with a fix commit id for just one branch 4.14. So the list of versions listed as affected is not telling the whole truth. For eg., it does not list 5.13.3 as affected. If one were to take anything not listed as unaffected, then a tool consuming that data would wrongly (and dangerously) say 5.13.3 is unaffected which is not true here.
Thanks for flagging this example! This was actually an intentional decision by the providers of this data to track different branches in different vulnerability IDs. For example, for the 5.13 branch, this is tracked by https://osv.dev/vulnerability/UVI-2021-1001182. There are other variations for different branches, and with open source we the ability to be precise/complete with tooling to detect cherry picks across branches etc.
But yes, I understand the concern with incomplete data in general!
- Capturing machine readable information about branching seems out of scope for CVE. (Question: Does semvers have a convention for how branches are versioned?)
I don't believe semver (or most versioning) schemes enforce any conventions around branch versioning. If we provide clear rules on how to match a version to a group by saying it's a string prefix, (i.e. "versionGroup": "2.4."
), perhaps that will sufficient to avoid having to capture explicit branch information?
Given the above:
- We try to make it easier for people to capture this information (even if partial) in a consistent, intuitive, and uniform way.
- Provide ways to capture assertive not-affected statements since many CNAs state that in the CVE descriptions.
- Provide a way to limit the scope of assertions (versionGroup) so datasets are at least complete for some areas.
- Provide heuristics for tools to make sense of partial information so they can still make safer affected/likely-affected/not-affected determinations.
What you proposed with versionGroups seems like it should address most of these, but I think it adds a fair bit of complexity and edge cases for processors to handle.
Perhaps another flatter alternative, and one that tries to make the two cases (complete vs incomplete data) more explicit would be:
"versions": [
{
"range": string,
"version": string, // specific version, or start of range; required
"before": string, // range ends just before this version; required when range is present
"status": string // optional can be "affected" (default) / "unaffected".
}
]
"versionsInfo": {
"complete": bool, // true or false based on if the provider/CNA believes the versions are comprehensive.
"knownVersionPrefixes": [ string ] // required if complete == false
}
When a version is not included in the list of versions
ranges, it means that the version is
versionsInfo.complete
is false. versionsInfo.complete
is true. a "status": "unaffected"
is redundant in this case.status: "unaffected"
and status: "affected"
ranges cannot overlap in any way.
When versionsInfo.complete
is false, versionsInfo.knownVersionPrefixes
must be specified with at least one prefix.
@chandanbn you also had "undefined, likely-affected, unlikely-affected" in your status
, but I think these aren't needed because:
versionsInfo.complete
is false)An algorithm can give four possible results about an input version: "affected", "unaffected", "likely-affected", "likely-unaffected".
If versionsInfo.complete
is true, checking if a version is "affected" just entails checking if the version is included in any provided version ranges (with status "affected"). Otherwise it's "unaffected".
If versionsInfo.complete
is false, a version is still checked against all the provided version ranges.
If it matches a range, then it should be either "affected" or "unaffected" based on the range's status
.
Otherwise, it's "unspecified".
If the version is unspecified at this point, then tooling can interpret it like so:
versionsInfo.knownVersionPrefixes
, then it's "unaffected".versionsInfo.knownVersionPrefixes
, and it's greater than or equal to max(before)
in all ranges, then it's "likely-unaffected", because it likely indicates a version that came in a later branch.@rsc @chandanbn what do you think? I think if we do it this way, we can also stick with a single versions
list.
@oliverchang I like an indicator of completeness (versionsInfo.complete).
versionsInfo.knownVersionPrefixes seems like an aggregation of versionGroups. Not sure if we are achieving anything by separating them out to a different field.
Having some guidance on how to record a versionGroup name should also help tooling. Prefix matching can be tough unless there is an odd looking period at the end (2.4
will match 2.41.3
, so it should be either recorded as 2.4.
or 2.4.*
).
Prefix/glob matching may not work when a product does patching instead of semver:
product: 'Windows'
versions: [
versionGroup: '10', before: 'patch-6'
versionGroup: '11', before: 'patch-2'
]
versionsInfo.knownVersionPrefixes seems like an aggregation of versionGroups. Not sure if we are achieving anything by separating them out to a different field.
I think it simplifies the evaluation algorithm and prevents some edge cases when dealing with open ranges within a a group.
e.g.
{"versionGroup: "4.14", before: "*"}
{"versionGroup: "4.15", before: "*"}
The interpretation here would be, everything in 4.14 and 4.15 is affected.
In the case this describes an incomplete set of versions, if we have "4.16.1". It should be "unlikely-unaffected" because it's newer than all versions, but there's no actual versions to compare it to in the two ranges (they're both "*"). There would have to be a way to compare "4.16.1" to an actual group ("4.15"), which seems difficult to do in a generalisable way.
It also adds complexity to evaluating these rules even if this describes a complete set of versions.
Having some guidance on how to record a versionGroup name should also help tooling. Prefix matching can be tough unless there is an odd looking period at the end (2.4 will match 2.41.3, so it should be either recorded as 2.4. or 2.4.*). Prefix/glob matching may not work when a product does patching instead of semver.
Sure, but I think since versionGroup/Prefix is essential to determining if a version is affected, it needs to be unambiguously computable by tooling. I think we will need either prefix (or pattern matching/regex) for that.
Re patching, perhaps another way would be to just have:
{version: '10', before: 'patch-6', "type": "patch"}
{version: '11', before: 'patch-7', "type": "patch"}
? That way, versionGroup/Prefix can have consistent automatable rules.
@chandanbn thanks for the example of the Linux kernel vulnerability. It looks like that bug may go back all the way to 2.6.12 and no one has taken the time to figure out exactly which versions are affected, which is a great case to try to encode.
@oliverchang and I spoke for a while and didn't come up with an obvious win yet. We'll circle back early next week.
This issue is about making version information computable, meaning that there is a clear algorithm IsVersionAffected that takes as input a CVE record and a specific version and answers the question “is this version affected by this CVE?”
There are two concerns: (1) defining something precise enough for an algorithm to implement, and (2) defining something clear enough that people writing these records - and also the people implementing the algorithm - get it right.
There are many, many ways to do (1) but relatively fewer ways to do (2).
We already have the problem of needing to define specific version types to make even a less-than comparison work. A versionGroup adds another kind of definition on top of that. Also, version groups assume a particular development model that may or may not hold. For example if v4 and v5 are being developed independently, then you might want to say that it is fixed in v4.19.2 onward within v4 (including v4.20 but not including v5) and then separately also fixed in v5 starting at v5.13.4.
It seems like it would be better to have fewer concepts if we can, which is to say leave versionGroup out if we can.
I think we should separate out point-wise assertions from ranges, because pointwise assertions don't require understanding the relative ordering of versions. Suppose we did this:
versionList: [{
version: specific version
status: unknown / affected / unaffected
}]
versionRanges: [{
type: string
initialStatus: unknown / affected / unaffected (optional; default unknown)
statusChanges: [{
version: version where status changes
status: unknown / affected / unaffected
}]
}]
This would replace both the affectedVersions and testedVersions in my previous attempt.
If a version appears explicitly in the version list, then the answer is the given status. That's the easy part.
Otherwise, we consult the ranges. Each range specifies the version type (semver, git, linux, etc) and an optional initial status and then a "timeline" ("versionline"?) of where the status changes. For the Linux kernel bug we could use:
versionRanges: [
{
type: linux
initialStatus: unaffected
statusChanges: [
{start: v3.16, status: affected}
{start: v4.19.198, status: unaffected}
{start: v4.20, status: affected}
{start: v5.13.4, status: unaffected}
]
}
]
This effectively encodes this picture of the version timeline:
| unaffected at start of timeline
|
|
o v3.16 changes to affected
X
X
X
o v4.19.198 changes to unaffected
|
|
|
o v4.20 changes back to affected
X
X
X
o v5.13.4 changes back to unaffected
|
|
| unaffected for rest of timeline
Normally you'd have only one versionRange for a given type. This particular issue might add a second range of type "git" to list the specific commit hashes.
The algorithm is to find the versionRange for the type of version you are holding and then do:
status = initialStatus
for c in statusChagnes
if version >= c.start
status = c.status
return status
This seems pretty clear for both readers and programmers.
I think this encodes the ranges clearly and without the duplication that's needed for a list of [start,before) spans (where each one's before is usually the next one's start).
It also explicitly allows status "unknown" (and makes that the default), and we could add status "likely" or "probable" if necessary.
Thoughts?
@rsc Wouldn't this be essentially restricting the use of existing versionAffected to '>=', '!>='? If that restriction yields less ambiguous and more machinable records then reduction in expressibility is ok.
if version >= c.start
Isn't the comparison here still the version-tree (directed acyclic graph) based comparison?
For git, one must query the SCM to find one commit is hash is before or after another commit hash. Since we capture the git repo URL, I feel this is computable.
For semvers or anything else, I see a few requirements:
BTW, for the Linux kernel example above only the seven fixed branches seem to be tracked. The sum total of Affected versions (aggregated from those 7 ids in OSV) would miss any version from an unmaintained Linux kernel branch (such as 5.12.10). However using the suggested record format and the algorithm querying the SCM (git repo) on git commit ids, one would in theory correctly identify 5.12.10 as affected.
@rsc Wouldn't this be essentially restricting the use of existing versionAffected to '>=', '!>='? If that restriction yields less ambiguous and more machinable records then reduction in expressibility is ok.
I suppose it's restricting the use to purely a sequence of '>=', with the rule that later entries override earlier ones. And yes, I think that that restriction makes the records easier to interpret and probably also easier to write.
Isn't the comparison here still the version-tree (directed acyclic graph) based comparison?
Yes, the comparison has to be defined by the 'type' entry in the range object. If the type is 'semver' then https://semver.org defines ordering. If the type is 'git' then ordering can only be checked with respect to the actual repo. And we can define other numeric types (I assumed a 'linux' type above) as needed. We might want to define a 'dotted' type that is only for dot-separated numbers, with the obvious meaning. (All the subtlety about semver etc happens when you get to variations like 1.2-3 or 1.2rc5.)
the list has to be first sorted on start versions (easy).
Agreed.
should have at least one entry for the start of every branch of the previous branch had a fix and this has to be first version of that branch (hard, because not everyone may recollect the first version in a branch).
Agreed. And that really is a concern, but we could potentially define that in the semver ordering you can write 4.20 (no third number) to mean anything starting with 4.20, including prereleases.
BTW, for the Linux kernel example above only the seven fixed branches seem to be tracked. The sum total of Affected versions (aggregated from those 7 ids in OSV) would miss any version from an unmaintained Linux kernel branch (such as 5.12.10).
Yes, I agree with that. I don't think the 7 different IDs are a good approach. It actually makes it almost impossible to say what is and is not affected. @oliverchang is going to talk to the UVI team about why they chose that approach. We should strive for a single ID in CVE.
However using the suggested record format and the algorithm querying the SCM (git repo) on git commit ids, one would in theory correctly identify 5.12.10 as affected.
Yes, and one of the things we hope OSV will be able to contribute to the CVE ecosystem once data is in this format is suggesting updates where the git commits indicate that the numeric version ranges can be made more precise.
Regarding "sorted on start versions (easy)":
I hope that CVE records will be written with sorted lists anyway, perhaps with automation to keep them sorted, but I agree that clients should be expected to sort too.
(Technically speaking it is not necessary for the client to sort, only to find the status line with the largest version <= the version being checked. That's O(n) instead of O(n log n). But I think it is fine to say that clients should behave as if they sorted the list and leave not sorting as an optimization.)
Most versioning numbering systems have a clear linear ordering: v1.2.3 before v1.2.4 before v1.3.0 before v2.0.0. Sorting is indeed easy there.
For a Git commit graph, all we can do is sort by topological order (parents before children). That's still easy, it's just important to recognize it as not quite normal sorting. The algorithm and the data format still make sense for this kind of directed acyclic graph. For example the Git commit ranges for CVE-2021-33909 would be written:
type: git
repo: https://url
initialStatus: unaffected
statusChanges: [
{status: affected, start: 058504edd02667eef8fac9be27ab3ea74332e9b4}
{status: unaffected, start: 3533e50cbee8ff086bfa04176ac42a01ee3db37d}
{status: unaffected, start: c5157b3e775dac31d51b11f993a06a84dc11fc8c}
{status: unaffected, start: 3c07d1335d17ae0411101024de438dbc3734e992}
{status: unaffected, start: 6de9f0bf7cacc772a618699f9ed5c9f6fca58a1d}
{status: unaffected, start: c1dafbb26164f43f2bb70bee9e5c4e1cad228ca7}
{status: unaffected, start: 174c34d9cda1b5818419b8f5a332ced10755e52f}
{status: unaffected, start: 058504edd02667eef8fac9be27ab3ea74332e9b4}
]
This turns out to be a clear improvement over the original ranges, because you don't have to say the commit that introduced the bug 7 times.
Maybe the best approach is to have multiple options for expressing version information, depending (in part) on whether the product has a support policy (explicit or implied). The type of information submitted to the CVE Program tends to have a bifurcation depending on whether a support policy exists, even when the existence of a support policy is not mentioned within the vulnerability announcement itself.
Although CVE is not really "about" prescriptive information from vendors, it may be more likely for vendors to participate if the information displayed in CVE Records, and the information available to CVE-based tools, is closely aligned to what the vendor provides directly to customers, either within vulnerability announcements or during customer-support interactions. In other words, the approach potentially helps with CVE adoption.
The hope is to develop the best practical algorithm within the context of what data providers have traditionally been willing to submit to the CVE Program. It should avoid soliciting extra information such as "{start: v4.20, status: affected}" which, in practice, is very rare to see from program participants. For example, many people who rely on the 4.19.* longterm-supported Linux kernel series are unaware of whether 4.20.x ever existed (or whether 5.0 came right after a 4.19.x version). Similarly, if a vulnerability announcement mentions a 3.4.x fix and a 3.6.x fix, does that mean that 3.5.x is "affected" and potentially important, or does it mean that odd minor-version numbers are never visible outside of the development staff?
CVE Records are for vulnerabilities in released software. For purposes of CVE, it is not necessary to state which commits are associated with the vulnerability lifecycle, or to express whether any specific pre-release software came before or after a released version.
Here is a very rough outline of how the schema could accept four different major types of version specification.
Semantics:
If the consumer's product version does not match any of the assessedSemverRegexp regular expressions, then the output of the algorithm is the word Unsupported. This means that the vendor is recommending against use of that version. For vulnerability management purposes, this may be treated the same as the word Affected.
Otherwise, if one regular expression is matched, and assessmentPending is found, then the output of the algorithm is the word Unknown. Otherwise, if one regular expression is matched, and the consumer's product version is greater than or equal to the fixedStartingFrom value, then the output of the algorithm is the word Fixed. Otherwise, if one regular expression is matched, and the consumer's product version is within any specified otherUnaffected range, then the output of the algorithm is the word Fixed. Otherwise, the output of the algorithm is the word Affected.
Note: otherUnaffected is optional. Although producers are free to choose their own use cases, the envisioned primary use case is a situation where the vulnerability was introduced in a very recent version. Thus, there are expected to be many customer deployments that are completely safe (e.g., not affected by any CVE or any vulnerability that was silently fixed by the vendor), and therefore it's a waste of customer effort to trigger updates. In one example below, only 4.9.359 was affected. Commercial software vendors typically only express the version numbers of new versions that have fixed a vulnerability. From the perspective of many commercial software vendors, a vulnerability announcement has two purposes: to protect customers from attacks, and to lower support costs by reducing the variety of versions deployed in the field.
example with only one assessedSemverRegexp item
{"assessedSemverRegexp": ".", "fixedStartingFrom": "20.1.34"}
example with multiple assessedSemverRegexp items
{"assessedSemverRegexp": "^5\.", "fixedStartingFrom": "5.0.0"}
{"assessedSemverRegexp": "^4\.14\.", "fixedStartingFrom": "4.14.250", "otherUnaffected": [{"semverBegin": "4.14.0", "semverEnd": "4.14.0"}, {"semverBegin": "4.14.50", "semverEnd": "4.14.89"}]}
{"assessedSemverRegexp": "^4\.9\.", "fixedStartingFrom": "4.9.360", "otherUnaffected": [{"semverBegin": "4.9.0", "semverEnd": "4.9.358"}]}
{"assessedSemverRegexp": "^4\.4\.", "assessmentPending": true}
Semantics: if the customer's product version does not equal any of the assessedBaseVersion values, then the output of the algorithm is the word Unsupported. For vulnerability management, this may be treated the same as the word Affected. Otherwise, if the customer's product version equals one of the updateOptions values, or equals one of the otherUnaffected values, then the output of the algorithm is the word Fixed. Otherwise, the output of the algorithm is the word Affected. Clearly, vendors who don't (or can't) provide updateOptions values will trigger many false positives (if the CVE List is the sole data source for vulnerability assessment).
This is primarily for vendors who submit CVE Records that state a set of product versions, each of which may be vulnerable depending on whether an update action has occurred (e.g., installing a service pack, fix pack, hotfix, patch, etc.). In many cases, the CVE Record does not fully describe the update action (possibly because that action is dynamically chosen based on details of a customer environment). Thus, updateOptions (a set of update actions, any of which is sufficient to fix the vulnerability) can be specified, but is optional.
example in which updateOptions is not provided
{"assessedBaseVersion": "2.0"}
{"assessedBaseVersion": "3.0"}
{"assessedBaseVersion": "3.5"}
examples in which updateOptions is provided
{"assessedBaseVersion": "3.0", "updateOptions": ["3.0 HF17", "3.0 SP1 HF6"], "otherUnaffected": ["3.0 HF1", "3.0 HF2", "3.0 HF3"]}
{"assessedBaseVersion": "10", "updateOptions": ["October 2021 monthly updates", "23456"]}
Semantics
If the consumer's product version was tested and found to be affected, then the output of the algorithm is the word Affected. If the consumer's product version was tested and found to be not affected, then the output of the algorithm is the word Fixed. Otherwise, the output of the algorithm is the word Unknown.
A. examples that may be typical of automated testing
{"semverTestCases": [{"semverBegin": "3.0.0", "semverEnd": "3.15.12"}, {"semverBegin": "4.0.0", "semverEnd": "4.3.8"}], "affected": ["3.3.1", "3.3.2", "3.3.3", "3.3.4"]}
{"semverTestCases": [{"semverBegin": "1.0.0", "semverEnd": "22.3.1"}], "affected": ["5.6.2"]}
B. examples that may be typical of manual testing
{"semverTestCases": [{"semverBegin": "4.0.6", "semverEnd": "4.0.6"}, {"semverBegin": "5.0.3", "semverEnd": "5.0.3"}], "affected": ["4.0.6", "5.0.3]}
{"semverTestCases": [{"semverBegin": "4.0.6", "semverEnd": "4.0.6"}, {"semverBegin": "5.0.0", "semverEnd": "5.0.2"}], "affected": ["4.0.6"]}
{"miscTestCases": ["Zeta", "January 2024", "LMNOP"], "affected": ["Zeta", "January 2024", "LMNOP"]}
Semantics
If the consumer's product version is a semver on the unaffectedSemverList, or a later semver, or a version on an unaffectedList, then the output of the algorithm is the word Fixed. Otherwise, if the consumer's product version is in the specificAffected field, then the output of the algorithm is the word Affected. Otherwise the output of the algorithm is the word Unknown (possibly accompanied by a comment).
{"unaffectedSemverList": ["5.12.16"], "specificAffected": [], "commentOnAffected": "at least one earlier version"}
{"unaffectedSemverList": ["5.12.16"], "specificAffected": [], "commentOnAffected": "likely to be few earlier versions"}
{"unaffectedSemverList": ["5.12.16"], "specificAffected": [], "commentOnAffected": "likely to be many earlier versions"}
{"unaffectedList": ["Phi"], "specificAffected": ["Upsilon", "Tau"], "commentOnAffected": "likely to be few earlier versions"}
{"unaffectedList": ["Phi"], "specificAffected": ["Upsilon", "Tau"], "commentOnAffected": "likely to be many earlier versions"}
The hope is to develop the best practical algorithm within the context of what data providers have traditionally been willing to submit to the CVE Program.
For what it's worth, this seems self-defeating to me. Yes, we have to be able to cope with what vendors provide, but for vulnerability management to scale industry-wide, we also need to encourage more precise data than the current English text.
I think the idea of comments and suggested upgrades are interesting, but those could be added to the proposed object in a separate discussion. (This is definitely a benefit of an object.)
Finally, speaking from experience, regular expressions are not a good answer: they are far too easy to embed subtle bugs in and too hard to scrutinize for those bugs. We should probably avoid them here.
A few comments, some of which have already been discussed but I didn't see a clear decision:
Another approach to the "Tested" list is to just stick with affected/not affected but identify the subject of the claim. Researcher can state that "version 1.1 is affected" and supplier/vendor/project can state "version 1.1 is not affected" and I can parse out that there's a disagreement and I need to go investigate. This avoids giving the vendor/project/supplier ultimate authority in the claim, in that researcher testing is inferior to vendor statements (this might often be true, but not always, to a non-trivial degree).
If comprehensive testing is not assumed (i.e., not listed as affected == not affected), then a way to convey "Not affected" is useful. In this model, unlisted version implies nothing, there needs to be an explicit statement of affected or not.
And another list, "Supported" (and possibly unsupported).
As a consumer of this information, I'd like to know who is making the claim, what version/ranges are affected, what are not, what is unknown, and what is unsupported (or wontfix).
The tested/affected separation was partly to have two different default statuses (untested/unknown for tested, unaffected for affected), but that ended up more confusing than helpful. Instead in the latest suggestions there is always an explicit status, which can be unaffected/affected/unknown. We could potentially think about adding an explicit unsupported, since that seems to be the most common reason for unknown.
I believe the information about who is making the claim is supposed to be from 'requester' elsewhere in the record, and then there is the adpContainer for extra statements by others. If additional clarity is needed around authorship, it seems like that should be a separate issue discussion from version details.
Hello all. I took away from the discussion at the last QWG meeting that:
As I noted before, the trick is to balance (1) defining something precise enough for an algorithm to implement, and (2) defining something clear enough that people writing these records - and also the people implementing the algorithm - get it right.
Here is a new potential schema incorporating that feedback and that I hope is still a reasonable balance of (1) and (2):
versions: [{
version: $version
status: $status // unknown, affected, unaffected; unsupported?
range: string (‘semver’, ‘git’, ..., to define meaning of <)
repo: string (optional for range ‘git’)
limit: $versionLimit (this range stops just before limit; can use * for “infinity” aka "maxuint")
changes: [{
at: version where status changes
status: ...
}]
}]
An object in the versions list can be either:
The algorithm for deciding the status of a particular version V is then:
for entry in versions
if entry.limit is not present and v == entry.version
return entry.status
if entry.limit is present and v <= entry.version and v < entry.limit
status = entry.status
for change in entry.changes
if v >= change.at
status = change.status
return status
return “unknown”
The rest of this comment gives worked examples for the cases in Chandan’s presentation as well as a Git-based case that UVI wants to be able to encode that was part of the motivation for the previous iteration of the schema.
versions: [
{
version: 1.1, limit: 1.*, range: semver,
status: affected,
changes: [
{at: 1.6, status: unaffected}
]
}
]
versions: [
{
version: 1.1, limit: 1.*, range: semver,
status: unknown,
changes: [
{at: 1.4, status: affected},
{at: 1.6, status: unaffected}
]
}
]
Notes:
limit: 1.*
would become limit 1.9
.versions: [
{
version: 3.0, limit: 3.*, range: semver,
status: affected,
changes: [{at: 3.4, status: unaffected}]
},
{
version: 4.0, limit: 4.*, range: semver,
status: affected,
},
{
version: 5.0, limit: *, range: semver,
status: affected,
changes: [{at: 5.2, status: unaffected}]
}
]
Notes:
versions: [
{
version: 3.0, limit: 3.*, range: semver,
status: unaffected,
changes: [
{at: 3.3, status: affected},
{at: 3.5, status: unaffected}
]
},
{
version: 4.0, limit: 4.*, range: semver,
status: unaffected,
},
{
version: 5.0, limit: *, range: semver,
status: unaffected,
changes: [
{at: 5.2, status: affected},
{at: 5.4, status: unaffected}
]
}
]
versions: [
{
version: 3.0, limit: 3.0-*, range: patch,
status: unaffected,
changes: [
{at: 3.0-patch-C, status: affected},
{at: 3.0-patch-E, status: unaffected}
]
},
{
version: 4.0, limit: 4.0-*, range: semver,
status: unaffected,
},
{
version: 5.0, limit: 5.0-*, range: semver,
status: unaffected,
changes: [
{at: 5.0-patch-A, status: affected},
{at: 5.0-patch-C, status: unaffected}
]
}
]
Notes:
We also want this to work well for version control revision information. Here is a simplified version of the Linux bug:
The bug was introduced in commit 1234, which was first released in v3.16. It was later fixed twice, in 4567 which landed in v4.19.198 and in 6789 which landed in v5.13.4.
We can represent this situation with:
versions: [
{
version: 3.0, limit: 3.*, range: linux,
status: affected,
},
{
version: 4.19, limit: 4.19.*, range: linux,
status: affected,
changes: [{at: 4.19.198, status: unaffected}]
},
{
version: 5.13, limit: 5.13.*, range: linux,
status: affected,
changes: [{at: 5.13.4, status: unaffected}]
},
{
version: 1234, range: git,
repo: https://github.com/torvalds/linux,
status: affected,
changes: [
{at: 4567, status: unaffected},
{at: 6789, status: unaffected}
]
}
]
The last version object describes the precise git commit ranges. Anything after hash 1234 is affected, except that commits starting at 4567 and at 6789 (on different branches) are unaffected. This makes clear that future extensions of the v4 and v5 branch are unaffected, while commit 7890 is still affected. This encoding is the way most vulnerabilities with a single introduction but multiple branched fixes would encode the version control graph.
For the specific case of Linux, the UVI project wants to treat vulnerabilities on different kernel version branches as completely different vulnerabilities, as a matter of policy, essentially treating different kernel versions as different products. (Although I think this is a mistake in this case, perhaps there are other contexts where it makes sense, so it’s worth examining how to do it.)
The obvious encoding is to write this in the vulnerability entry for the v4 “product”:
versions: [
{
version: 1234, range: git,
status: affected,
changes: [{at: 4567, status: unaffected}]
}
]
And this for the vulnerability entry for the v5 “product”:
versions: [
{
version: 1234, range: git,
status: affected,
changes: [{at: 6789, status: unaffected}]
}
]
The problem with this pair of vulnerability entries is that according to the v4 entry, 6789 is affected, and according to the v5 entry, 4567 is affected. So every kernel commit after 1234 is going to appear to be affected by at least one of these entries. Again, that’s the right default behavior: in the complete version in the previous example, we definitely want to identify 7890, on an unfixed branch, as affected. The problem here is that v5 appears to be an “unfixed branch” for the v4 vulnerability, and vice versa.
We can fix this problem by using limit (just like above) to limit the effect to a single branch. In this case, a limit L for a git range would mean the range only applies to commits that are on the branch leading to L (meaning they are parents of L). This is the same “only before” meaning of limit as in the semver limits.
That is, we can write:
versions: [
{version: 1234, limit: 4567, range: git, status: affected},
{version: 4567, range: git, status: unaffected},
]
and
versions: [
{version: 1234, limit: 6789, range: git, status: affected},
{version: 6789, range: git, status: unaffected},
]
This form has the downside of not making clear that 7890 and other off-v4, off-v5 commits are affected, which is why I think UVI’s policy is a mistake. But if that is the policy someone needs to encode, then the new limit field provides a way to do that.
I have posted the schema pull request for reference, but discussion is probably better here than on the PR.
One concern about this timeline event model is that there's a race condition involving relevant anonymous events. This is perhaps hard to explain, so I've started with examples. I've also suggested a small change that can fix the problem in, at least, some realistic situations. The change is to stop hardcoding 'return "unknown"' at the end of the algorithm, and let the author of the CVE Record choose to return whatever valid status they want. I feel that this will make data entry easier and less error-prone, and probably increase the number of data providers willing to provide computable information.
Currently, a typical versions key can have:
versions: [
{
version: 3.0, limit: 3.*, range: semver,
status: affected,
changes: [{at: 3.4, status: unaffected}]
},
{
version: 4.0, limit: 4.*, range: semver,
status: affected,
},
{
version: 5.0, limit: *, range: semver,
status: affected,
changes: [{at: 5.2, status: unaffected}]
}
]
The small change is to put the array of entries inside an object:
versions: { default: myDefault, entries:
[
{
version: 3.0, limit: 3.*, range: semver,
status: affected,
changes: [{at: 3.4, status: unaffected}]
},
{
version: 4.0, limit: 4.*, range: semver,
status: affected,
},
{
version: 5.0, limit: *, range: semver,
status: affected,
changes: [{at: 5.2, status: unaffected}]
}
]
}
Also, the bottom of the algorithm changes from:
return "unknown"
to:
if versions.default is present
return versions.default
else
return "unknown"
For example, consider the following realistic scenario. A vulnerability is being announced although no fix is yet shipping. The data provider knows the exact status of every version that has ever existed. Specifically, the vulnerability announcement states that 2.8.0 and later 2.8.x versions are affected, 3.0.0 and later 3.0.x versions are affected, and no others are (or will be) affected. It also states that a fix will be available later, and will be shipped with a version number of either 3.1.0 or 4.0.0 (those are the only two possibilities; it just depends on whether there will be an incompatible API change). Furthermore, it states that no more 2.x versions will be shipped (that series ended at 2.8.x) and no more 3.0.x versions will be shipped. Finally, it states that the fix (in either 3.1.0 or 4.0.0) will be effective going forward, because the entire problematic code component is being removed.
Apparently this could be expressed as:
versions: [
{
version: 0.0.0, limit: *, range: semver,
status: unaffected,
changes: [{at: 2.8.0, status: affected}, {at: 3.1.0, status: unaffected}]
}
]
(or in less compact ways that have the same downsides). To express this, it was necessary to refer to two versions that may or may not be real (0.0.0 and 3.1.0). The algorithm always produces correct results. However, the CVE Record data is hard for a human to produce (they need to reason about the algorithm before ultimately deciding that those unconfirmed version numbers - 0.0.0 and 3.1.0 - are the best way forward). The CVE Record data is also potentially misleading to later human readers, who might think it implies that 3.1.0 was released even if the developers had actually decided to go with 4.0.0 instead of 3.1.0. Also, the SemVer specification is ambiguous about whether there is a reasonable way (such as 0.0.0) to express a lower bound (it says "The simplest thing to do is start your initial development release at 0.1.0" and 0.0.0-alpha is also a valid choice).
With the proposed change, the data provider can simply write:
versions: { default: unaffected, entries:
[
{
version: 2.8.0, limit: 2.8.*, range: semver,
status: affected,
},
{
version: 3.0.0, limit: 3.0.*, range: semver,
status: affected,
}
]
}
Here, regardless of whether the fix is shipped in 3.1.0 or 4.0.0, the data provider has no need to ever update the CVE Record. The CVE Record only refers to real versions. It is simple to reason that this is a correct data representation for the algorithm.
To align this with the terminology introduced at the beginning of this comment:
"a fix will be available later, and will be shipped with a version number of either 3.1.0 or 4.0.0" is an anonymous event. The existence of this event is clearly relevant to the end of the 3.x affected series, but we don't yet know whether it's going to be named an "at 3.1.0 event" or named an "at 4.0.0" event.
It is, of course, completely normal for a CNA to publish a CVE Record before the fix is shipped, and for end users to begin to do vulnerability assessment on the basis of that CVE Record.
Now, one might argue that the anonymous event isn't relevant to these end users. Neither 3.1.0 nor 4.0.0 exists yet, and thus changes: [{at: 2.8.0, status: affected}] is sufficient for vulnerability assessment. Anyone running 2.8.0 or any later version is vulnerable at this point in the release cycle.
This, however, has a race condition between the software release process and the CVE Record update process. The people publishing a release (and the customers updating to that new release) might be much more diligent than the person maintaining the CVE Record, with the result that thousands of customers will get false positives starting from the day that the new release is published. Thus, it's a bad idea to ever have changes: [{at: 2.8.0, status: affected}] at the end of the timeline.
The CVE Record author can force an "unknown" result for everything after 3.0.x, but that's really not much better than the false positive. End users want a result of "unaffected" as soon as they update to the fixed version.
Of course, the CVE Record author can work around this by guessing 3.0.1 as the name of the anonymous event, but that's confusing both on the producer side and on the consumer side. And, conceptually, that guessing is useful to nobody. The vulnerability-assessment facts are completely known in advance: only 2.8.x and 3.0.x versions are vulnerable.
This proposed "default" key also has important use cases for other status values (not only for "unaffected"). If the working group decides to add "unsupported" to the valid status values, then any data provider could choose "unsupported" as their default in any CVE Record, in contexts where other data providers may have relied on "unknown" instead. (For example, the data provider implicitly relied on "unknown" for version 2.0 in the example at the top of this comment.)
I was not part of the discussion, so this may feel off topic; my comments below may be entirely obvious to you; if so, please ignore this!
I came to appreciate that version ranges can only ever be an approximation; and that a complete enumeration of all affected versions is the only correct statement; this was based on insightful comments by @oliverchang and @rsc made elsewhere.
IMHO there is no such thing as a "computable version identification" that works in all cases.
One possible exception may be crypto-bound closed version ranges like commit hashes. In all other cases I can fathom, affected and unaffected versions can be inserted in a range after the fact; a range may be resolved correctly as intended today; it may be incorrect tomorrow when new versions may be squeezed in the range even with semver: we are mere humans releasing software and we may deviate at times from whatever clean version range scheme we say we are using.
Because of this --for a vulnerability database that I co-maintain-- we are evolving our vulnerability data structures to store:
Both are optional, and the enumeration is the only thing that is certain.
The ranges are hints for tools and humans to re-evaluate and update the concrete affected versions such as when there are new releases of the package or product at hand. And when this re-evaluation or review happens this can lead to:
In practice, when there is a new version that is in an affected range and not yet enumerated, this means that the version MAY BE affected, short of other info. Until tested (by tools or by humans, fuzzing, code analysis or else) that's the best that can be said; and when tested, a version becomes enumerated.
I am suggesting using a similar approach and stop trying to make version ranges first class concepts. Rather:
In this approach, it is OK to have no enumeration when we do not know yet (and for the shy vendor that does not want to disclose).
When users are reviewing vulnerabilities in their list of (package|products)/versions, they can get two bits of information:
If the ranges are treated as hints (and not mixed with the concrete resolved list of versions), it is still important to get their updated grammar and syntax right, but this could become a lesser issue as this would NOT be the primary, default way to get versions... but just a hint.
@pombredanne you are right. The aim here is to capture the hints in a way that is less ambiguous for tools and humans. There should be less room for misinterpretation with fewer false negatives and false positives.
For open-source projects with a public git repo, commit hashes, and tagged versions, an automated service can help generate (and refresh) a list of concrete vulnerable versions.
@pombredanne and @chandanbn, for what it's worth, I disagree that ranges are only human hints and can never be treated as precise by computers. It's true that you have to be careful to make them precise, and in particular you need to say what the numbering system is (versionType here) and have that system be well-defined. If it's not, then yes, the best you can do is an enumeration, perhaps sanity checked by a version control range.
In Go in particular (which uses semver numbering), it is possible to generate a semver version corresponding to each commit to a repo. It would not make sense to require a CVE to enumerate every single commit when a simple (and much shorter) range can be specified instead. But we could still have git ranges and semver ranges and cross-check the meaning of the semver ranges against the git ranges.
The required enumeration is also problematic for commercial software when a vendor wants to say "fixed in 5.2" and not enumerate all the prior versions that were affected. A range makes that easy to express. There may be no complete enumeration.
I agree that it can be a fine approach to do both the enumeration and the ranges and have some kind of automation to cross-check them - or a semver range and a git range, again cross-checked. That works especially well for open source. But I don't believe that approach can be required of every situation. (One thing I've come to appreciate from all these discussions is the sheer breadth of situations that CVE must be able to capture.)
@ElectricNroff, if the vendor has guaranteed all those things, I don't see a problem with the as-yet-nonexistent version 3.1.0 in:
versions: [
{version: 0, limit: 3.0.*, range: semver, status: affected},
{version: 3.1.0, limit: *, range: semver, status: unaffected}
]
Generally speaking, predicting the future is hard. Instead of layering additional ways to set down predictions about the future, it seems much better to make it easy for vendors to update their CVE records as new facts become known. After all, it is also true that customers may pressure the vendor to issue a fix in the 3.0 branch after all. No amount of encoding the future can account for actual changes to the expected future. Instead, we should make it easy for vendors to amend their CVE records. So it also seems fine if the vendor chooses to issue a CVE with:
versions: [
{version: 0, limit: *, range: semver, status: affected}
]
and then amend the record later when fixes come out.
Changes in latest PR, based on Tuesday meeting discussion:
Latest commit message summary:
The shorthand version of this schema is:
defaultStatus: $status
versions: [{
version: $version
status: $status // unknown, affected, unaffected
versionType: string (‘semver’, ‘git’, ..., to define meaning of <)
repo: string (optional, intended for versionType ‘git’)
lessThan/lessThanOrEqual: $version (can use * for “infinity” aka "maxuint")
changes: [{
at: version where status changes
status: ...
}]
}]
An object in the versions list can be either:
The algorithm for deciding the status of a particular version V is then:
for entry in product.versions {
if entry.lessThan is not present and entry.lessThanOrEqual is not present and v == entry.version {
return entry.status
}
if (entry.lessThan is present and entry.version <= v and v < entry.lessThan) or
(entry.lessThanOrEqual is present and entry.version <= v and v <= entry.lessThanOrEqual) {
status = entry.status
for change in entry.changes {
if change.at <= v {
status = change.status
}
}
return status
}
}
return product.defaultStatus
Fixes #87. Fixes #12. Fixes #77.
I also added 'custom' as a versionType that is not directly computable without further information. That will be necessary for upconverting the JSON 4.0 data.
If we are adding lessThan and lessThanOrEqual to allow up-converting <=, do we need a versionAfter to allow up-converting >? I feel we are complicating the structure for backwards compatibility.
I think it is probably important to rename limit to lessThan for clarity. I don't have a strong opinion on adding lessThanOrEqual or not: I will defer to you and others who understand how much weight to give up-converting issues.
I do observe that > is significantly less common in the 4.0 data than <=.
% cd cvelist
% git grep -E -h '"(version_)?affected"' |
sed 's/version_//; s/[ ][ ]*/ /g; s/,//' |
sort |
uniq -c |
sort -nr
12965 "affected": "<"
10495 "affected": "="
2608 "affected": "<="
1104 "affected": ">="
298 "affected": "!>="
211 "affected": "!=>"
149 "affected": "!"
98 "affected": "?>"
82 "affected": "!<"
42 "affected": "?"
32 "affected": ""
26 "affected": "?<="
21 "affected": ">"
11 "affected": "!>"
9 "affected": "undefined"
8 "affected": "?<"
4 "affected": "=>"
3 "affected": "2021.1.7316"
3 "affected": "2021.1.7149"
3 "affected": "2020.6.5146"
3 "affected": "!<="
2 "affected": "1.09"
1 "affected": "?>="
1 "affected": "=6.3.x"
1 "affected": "<=7.1.3.1"
1 "affected": "2020.6.4671"
1 "affected": "2018.9.17"
1 "affected": "10.16.3"
1 "affected": "0.9"
1 "affected": "!=<"
%
I spot-checked the "?>" entries and all the ones I looked at were Jenkins plugins that used the form:
{
"version_value": "1.8",
"version_affected": "<="
},
{
"version_value": "1.5.2",
"version_affected": ">="
},
{
"version_value": "1.8",
"version_affected": "?>"
}
The ?> could be dropped here since unknown would be the default anyway after saying affected in the range [1.5.2, 1.8] (using lessThanOrEqual).
I also looked at the > entries and many of them appear to be bugs. For example CVE-2021-0253 says
{
"platform": "NFX Series",
"version_affected": ">",
"version_name": "19.4",
"version_value": "19.4R3"
},
but https://kb.juniper.net/InfoCenter/index?page=content&id=JSA11146&actp=METADATA says clearly "19.4R3 and above", so this should be ">=".
So it does not seem like the case for versionAfter is anywhere near as strong as lessThanOrEqual.
Thank you for the stats! The numbers for >, !>, ?>
are small enough they can be flagged for up-conversion by hand. We don't need a versionAfter
.
The numbers for lessThanOrEqual
are significant but smaller. If they are coming from a few CNAs (and if they can fix it at the source), then we can consider it deprecated - slated for removal in the future.
In JSON 4, "version_affected": "<=" implies that, somewhere on the timeline after version_value, an event occurs such that the status is no longer asserted to be "affected" - and "unaffected" and "unknown" are both plausible post-event statuses. Here, "the timeline" is used to mean any of the mechanisms for entering version data, e.g., changes, version, or lessThan. The argument for lessThanOrEqual in JSON 5 is:
there are thousands of affected CVE records
without lessThanOrEqual, upconversion has two anomalies:
there may be no volunteers who can determine all of the correct post-event statuses before the deadline (November 2021)
if upconversion always chooses "unaffected" or always chooses "unknown" for the post-event status, then it destroys data that another entity may be relying on, because they use a different method to estimate what <= means
For this last point, another entity (e.g., a commercial vulnerability-assessment product) may currently be relying on https://github.com/CVEProject/cvelist to deliver computable data to its own constituents, e.g., with a more complex algorithm such as:
switch assigner {
case
"contact@wpscan.com":
fmt.Println("the post-event status is unknown")
case
"cna@mongodb.com",
"psirt@adobe.com",
"psirt@paloaltonetworks.com",
"security@tibco.com":
fmt.Println("the post-event status is unaffected")
default:
fmt.Println("the event's meaning is unspecified")
}
If upconversion always maps <= to the same post-event status, then it's impossible for that entity (using only the JSON 5 document set) to deliver the data quality that they previously delivered. Also, having them continue to use the JSON 4 document set forever isn't a good solution because, starting sometime in 2022, the JSON 4 document set will reach end-of-life.
Examples:
The situation may be less consistent when:
a CNA (except for contact@wpscan.com) produces <= data about products that it doesn't directly control, e.g.,
the CNA has many persons who are producing <= data for subparts of the CNA's scope (e.g., security@apache.org)
FWIW, I think that the content/diagrams in the introductory slides above should at least be referenced somewhere in the docs for the version array or in whatever User Guide we eventually create. The visualizations are very important to aid in understanding what is being done here, and understanding is important to proper usage.
I generated some docs using json-schema-for-humans
, which generates HTML docs based off of those descriptions. I'm still not confident that the majority of CNAs will understand the implications behind all of that from those schema descriptions alone.
@ElectricNroff Summarizing your concern there are many CVE entries that simply have information like CVE affects versions before v1, before v2, and before v3
and nothing else (no version group, no starting points, no affirmative not-affected statements). In those cases:
defaultStatus: unaffected
versions: [{
version: '0' // do we need a fist ever indicator? empty string, 0 or * ?
status: affected
versionType: semver if versions match sermver pattern, else custom.
lessThan: '*'
changes: [{
at: v1 status: unaffected
at: v2 status: unaffected
at: v3 status: unaffected
}]
}]
alternatively:
defaultStatus: unaffected
versions: [{
version: '0'
lessThan: v1
status: affected
versionType: semver if versions match sermver pattern, else custom.
},{
version: '0'
lessThan: v2
status: affected
versionType: semver if versions match sermver pattern, else custom.
}{
version: '0'
lessThan: v3
status: affected
versionType: semver if versions match sermver pattern, else custom.
}]
The entries were not computable in v4, and they will not be computable in v5. IMHO that is acceptable as this bug/pull request is not about making previously uncomputable info into computable. The CNAs now have better ways to express the same information.
update: defaulStatus is set to unaffected. That gives the expected results.
JSON 4 data that says "before" (aka the < comparison) isn't one of the hardest cases. JSON 4 data that says <= (sometimes expressed as "through v#.#.#") is a hard one. Also, I don't think either of your options for "before" would typically be used. Adjacent entries on an "at" timeline should have different statuses. Also, multiple entries of version zero and the same status can be replaced by the one entry with the highest limit (i.e., the v3 one). If the available data is that versions before 1.7.3, before 2.3.9, and before 3.2.1 are affected, then there are three upconversion options that may be reasonable choices:
Of course, only the third option can be error-free. The third option can often work well for CVE consumers who use the CVE Record data very soon after it's published (e.g., before the vendor has an opportunity to release 3.2.2). This scenario applies to CNAs who will continue to use that < data pattern in their JSON 4 documents that are published after CVE Services 2.0 has launched.
@chandanbn I think you meant 'defaultStatus: unaffected' throughout https://github.com/CVEProject/cve-schema/issues/87#issuecomment-906584822
@ElectricNroff, with both lessThan and lessThanOrEqual as options, along with the defaultStatus we added at your earlier suggestion, it looks to me like essentially all the JSON 4 data can be encoded faithfully. There is a question of what to do with entries that don't explicitly say "version X and above are unaffected", but that's a question for the converter: whatever the answer should be, it can be encoded precisely and clearly.
I can't quite tell: is your last comment arguing in favor of lessThanOrEqual, or are you saying that something else is needed as well?
I feel that the current design (e.g., with defaultStatus, lessThan, and lessThanOrEqual) is adequate, but that (when reasonably achievable) the upconverter should avoid adding explicit assertions that weren't present in the JSON 4 data.
For example, from the perspective of the algorithm used by the CVE Program, these two (which could be chosen for <= 3.2.1 in JSON 4 data) are exactly equivalent:
defaultStatus: unknown
...
versions: [
{
version: 0, lessThanOrEqual: 3.2.1, versionType: semver,
status: affected
}
]
defaultStatus: unknown
...
versions: [
{
version: 0, lessThanOrEqual: 3.2.1, versionType: semver,
status: affected
},
{
version: 3.2.2, lessThan: *, versionType: semver,
status: unknown
}
]
The reason that the first one is preferable is that a different entity (e.g., a commercial vulnerability-assessment product) may have the resources to develop their own algorithm that replaces:
return product.defaultStatus
with something like:
if ((version array has a length of 1 and contains lessThanOrEqual) and cveMetadataPublished.assigner == theAdobeUuid) {
return unaffected
}
return product.defaultStatus
if their customers demand that (and if Adobe was unwilling to change the data).
In other words, immediately before the "return product.defaultStatus" line is a hook point that third parties can use to insert their own code. In an actual use case, the third party would have to start from the algorithm pseudocode and implement a modified version on their own. The CVE Program isn't planning to package the algorithm as a standalone software product (and, even if it did, the product wouldn't ship with a supported extension framework).
Background
The OSV schema has been adopted by Go, OSV, Python, Rust, and UVI to describe vulnerabilities in open-source software. The OSV schema’s key advantage over the CVE format is that it identifies the specific affected packages and versions in a precise, computable way.
For example, suppose we wanted to check whether a particular software package, as described by an SBOM, made use of any open-source components with known vulnerabilities. An SBOM for a given package ecosystem would be a list of its packages and versions. A tool can test whether each SBOM entry is affected by a database entry written to the OSV schema, without any additional information (such a version or commit graph or access to the repository containing the source code for the open-source software). This is what we mean when we say the package and version identification is computable.
We propose that the new CVE JSON schema be changed to make its package and version identification computable too. This would make it possible for vulnerability-checking tools to check SBOMs against the CVE database as easily as they can currently check SBOMs against OSV-schema databases. Adjusting the CVE JSON schema would also allow OSV-schema databases to embed their information into CVE format, allowing all their vulnerability information to be pushed upstream to the CVE database and then propagated to any CVE-aware software, a net benefit for the entire software ecosystem.
This issue focuses on computable version identification. See issue #86 for computable package identification.
Computable version identification
After identifying that a particular package listed in an SBOM matches a package in a CVE database entry (#NNN), a vulnerability scanner must next identify whether the specific version in the SBOM is considered affected by the CVE. The entry must include self-contained information sufficient to make this decision algorithmically. The current schema does not satisfy this requirement (or else it is unclear how it does).
What is the algorithm for deciding if a version is considered affected? The current spec does not provide details on how to evaluate the rules. At the start, it is unclear whether the “versions” list must be grouped by “versionGroup” before further processing, so we’ll suppose there is a single group in our examples. It was also unclear which logical operator to apply to the version entries. Issue #12 says that rules should be evaluated with AND, which makes it impossible to list individual versions. For example:
The explanation in #12 is that this means “version = 1.0.0 AND version = 1.1.0”, which doesn’t match any version at all.
According to the answer in #12, expressing multiple disjoint ranges of versions is also not possible. For example:
Here it seems clear the intended interpretation would be
but there is no obvious way to encode this. Using ! operators would also not work. There is no boolean normal form with only one logical operator (that is, only AND, or only OR).
A second, related problem with the current schema is that even the definitions of operators like “>=” are not algorithmically precise. Clearly these are not string comparisons: 1.2.0 < 1.10.0. But neither are they simple element-wise comparisons: in packagers using Semver, 1.2.0 > 1.2.0-alpha. In Maven, even the alphabetic parts do not compare with strict regularity. In particular, this ordering applies:
An operator like “>=” cannot be applied without reference to a particular version ordering algorithm, and the CVE schema omits that information.
The different operator variants are also confusing. For example, is there any difference between these two?
Or is this one any different from those two?
The result of “is this version affected?” should be a boolean yes/no, or at worst yes/no/maybe, but the current operators allow yes/no/maybe/undocumented, with no guidance as to what CVEs should do. Should tools treat “no” differently from “undocumented”? Is it a best practice to document all the negative ranges too? Why?
The CVE schema needs to address these deficiencies so that tools have clear algorithms for deciding whether a particular version is affected by a particular CVE.
OSV’s solution
The OSV schema addresses all these ambiguities as follows, which we suggest CVE adopt the basic ideas of. This is not the only possible solution but we believe it is a good one.
The OSV schema supports both an enumeration of specific affected versions and an enumeration of specific affected ranges. The set of affected versions is the OR of the entries in these lists - there is never an AND.
A range specifies a contiguous range of versions according to some defined version ordering. Today, those are “SEMVER” (preferred), “GIT”, and “ECOSYSTEM”. The “GIT” and “ECOSYSTEM” (meaning “packager-defined ordering”) range types are not directly understandable by general-purpose tools; such ranges are extra information understandable only by special-purpose tools. A particular entry is required to ensure that all affected versions are either listed in the explicit enumeration or in a Semver-type range, both of which can be processed by standard, packager-independent algorithms.
Each range is an object with three fields: type (the ordering), introduced, and fixed. The affected versions are those >= introduced and < fixed. If introduced or fixed are omitted, then that end of the range is left open.
For packagers that use Semver ordering, such as Go, NPM, and Rust, it suffices to specify only ranges:
For packagers that use other orderings, a packager-specific range can be listed, but the packager’s own vulnerability database tooling must “compile out” the range into an explicit list as well, for consumption by general-purpose tools, as in this Python example:
(The “GIT” range has an additional field “repo” to specify the URL of the source repository containing the given commits.)
The “versions” list specifies the same versions as in the “ECOSYSTEM” range, just in a more accessible way. General-purpose tooling would ignore the “GIT” and “ECOSYSTEM” ranges, relying instead on the “versions” list in this case.
Potential CVE adaptation
We propose to change the current version schema from:
to:
The only combining operator is OR, making the algorithm for matching much clearer. A particular version would be considered affected if it is matched by any of the entries in the overall “versions” object list. A version is matched by an entry if it appears directly in the “list” or if it is in the “range”. This structure allows non-standard ranges to include their version lists in the same object, which is an improvement over the OSV schema, and it allows a particular range or list to be qualified by a “platform” list as well.
The “unsure” entry allows a range or list to be marked as unsure, equivalent to using the current ?>= etc operators.
The current !>= etc operators are removed: to say that a version is unaffected, leave it unlisted.