Closed oliverchang closed 1 year ago
Also, if there are fix patch information available, it would be super useful to users if we provide those as well via OSV's git ranges. e.g. in OSV/OSS-Fuzz we already have this: https://github.com/google/oss-fuzz-vulns/blob/53ad4dcef49b6c5f1c26480832011328e472b7c3/vulns/curl/OSV-2022-450.yaml#L28
This would allow users who are using curl as a library (and pulling it in as a e.g. a submodule) to automatically make use of curl advisories by just checking git trees.
Thanks for the excellent feedback!
We have a JSON schema available that performs some of this validation
Being a JSON rookie, what's the easiest way to verify this json against that schema?
The CVSS
We fake the score based on the named severity level (which is the only severity we provide), but I cannot fake a whole vector string. I don't think we can comply with this. I think it's a weakness in the schema.
CVSS is a poor system and we don't play long in that game.
We only have one of Low, Medium, High or Critical as level.. How do we convey that in the metadata? Should we just leave out severity and provide it as database_specific? Feels a bit lame.
Only one of "last_affected", or "fixed" should be specified, but not both.
Aren't both informative? (I actually didn't see "last_affected" before which is why it uses the wrong name)
by defining a "Curl" ecosystem
I hope you pick another solution. We are not an ecosystem. This problem is not unique to us and it would be a pity if others in our situation would have to claim they too are ecosystems.
if there are fix patch information available, it would be super useful to users if we provide those as well via OSV's git ranges
Neat. That's indeed a cool idea. I think we have this information for most/many issues but it is going to take some work for us to convert that into structured data that we can pull out and insert into JSON. I will work on it.
I made #241 to fix some of the easy things first.
Neat. That's indeed a cool idea. I think we have this information for most/many issues but it is going to take some work for us to convert that into structured data that we can pull out and insert into JSON. I will work on it.
Thinking further, I think I won't. We already provide exactly every single vulnerable version and it is easy for any git user to figure out if they are in a vulnerable range or not.
Thanks for the excellent feedback!
We have a JSON schema available that performs some of this validation
Being a JSON rookie, what's the easiest way to verify this json against that schema?
Check out the README here https://github.com/ossf/osv-schema/tree/main/validation :)
The CVSS
We fake the score based on the named severity level (which is the only severity we provide), but I cannot fake a whole vector string. I don't think we can comply with this. I think it's a weakness in the schema.
CVSS is a poor system and we don't play long in that game.
We only have one of Low, Medium, High or Critical as level.. How do we convey that in the metadata? Should we just leave out severity and provide it as database_specific? Feels a bit lame.
database_specific
is the only option here for now. The reason why we don't support low/medium/high/critical is because they don't allow for consistency across different sources. Every database tends to have its own qualitative scale for these.
Only one of "last_affected", or "fixed" should be specified, but not both.
Aren't both informative? (I actually didn't see "last_affected" before which is why it uses the wrong name)
We see "fixed" as being the most informative here. last_affected is redundant in such cases because we have the versions
array and anything before "fixed" is irrelevant.
by defining a "Curl" ecosystem
I hope you pick another solution. We are not an ecosystem. This problem is not unique to us and it would be a pity if others in our situation would have to claim they too are ecosystems.
Indeed it's not ideal, but the best solution we've come up with so far is to use repository URLs to achieve this. Most open source projects can leverage this "namespace/ecosystem" of repository URLs to identify themselves.
I do understand that repository URLs aren't totally accurate for curl, but you still be agreeable to using that as an identifier?
Thinking further, I think I won't. We already provide exactly every single vulnerable version and it is easy for any git user to figure out if they are in a vulnerable range or not.
Could I convince you otherwise? :) It's not trivial for git users to map the versions back to git tags/commits in an automated way. Also if the git metadata is provided explicitly, it enables our https://osv.dev API to index it and allow for queries such as:
curl -d \
'{"commit": "6879efc2c1596d11a6a6ad296f80063b558d5e0f"}' \
"https://api.osv.dev/v1/query"
To return any curl results that match.
Thanks!
We use vulnerability data and I can confirm that the low/medium/high/critical scales differ wildly between different vendors, the database_specific
field really is the best place for it especially if it has some information on which database/vendor made the call on the score.
And regarding the last_affected
information, it does get stale quite fast and is only perhaps useful up until the fixed version comes along. And even then if new, still affected, versions come out it even becomes 'wrong' in a way, and could trip someone up as if they are affected or not. The only information it really provides is when the vulnerability was noticed, as far as I can tell.
I also do hope it doesn't come to defining ecosystems for libraries, the repository URL solution seems like a better one, I do agree with @oliverchang that it all comes back to the source.
We use vulnerability data and I can confirm that the low/medium/high/critical scales differ wildly between different vendors
Maybe, but the CVSS score is in no way better. NVD for example re-scores everything anyway so whatever we provide, they modify it anyway. The CVSS is also too one-dimensional and leaves out a lot of factors which tends to make the score higher than they would be when taken a larger take. Which is part of the reason why we in the curl project opt to not play the CVSS game.
There simply is no objective way to establish the severity of an issue. There are but opinions of different parties.
By not providing any info at all about severity or package the json object complies with the schema... Not sure how that makes the JSON better, but at least there is no complaints from the tool.
Could I convince you otherwise?
Perhaps. It's a lot of work, and there are now several pieces of metadata we cannot provide anyway in the JSON object so the question is how much value we add by adding data to a partially supported JSON object?
Like this:
curl -d \ '{"commit": "6879efc2c1596d11a6a6ad296f80063b558d5e0f"}' \ "https://api.osv.dev/v1/query"
Doesn't that also need to tell which package it asks about? And you cannot ask about curl because it has no ecosystem so that field cannot be populated: so this command line cannot be run anyway.
I've started to update the advisories (and thus the json objects) with specific git commit info: introduced and fixed.
Doesn't that also need to tell which package it asks about? And you cannot ask about curl because it has no ecosystem so that field cannot be populated: so this command line cannot be run anyway.
The ecosystem
field is required in the package
object, but the package object is not required. You can have only the ranges, as far as I'm aware, and the API takes just the commit hash.
the API takes just the commit hash
How does the API know which project/git repo to check then?
I have now gone through and provided git range information to what I believe is all curl CVEs filed since 2017 (current count: 85). Those advisories have been updated and the JSON objects are now populated automatically with metadata from those.
How does the API know which project/git repo to check then?
The spec says the git commit hashes must be full-length, which makes them unique across all projects, making the project or git repo unnecessary. I notice that the ones in the curl JSON are truncated.
which makes them unique across all projects
So it then scans through all projects to find a match? Didn't expect that.
The spec says the git commit hashes must be full-length
It sounds like the schema needs to be updated because it does not object to shortened hashes...
I made the script convert all hashes to the long format now.
that turned out to be a good idea, since that also verifies that the hash is correctly entered so it helped me find a few mistakes!
Are the old curl advisory IDs easily accessible (e.g. adv_20150108A)? Those are listed in places under "vendor advisory" so would be good to have in the JSON as an alias so they can be found that way.
We never used them as actual IDs in the project, we used them as URLs. We still have working redirects for those old URLs to end up on the appropriate current page.
Is it really a good idea to upgrade those to "official IDs" now when we haven't used them for the last five years? Surely the only users of those are using the URLs?
They could be resurrected from git, but then we would also need to invent a way to store it in the advisories and get it into the JSON.
I suspected it might not be easy any longer. It's probably not worth the effort then.
Are there any more details left to fix now? The JSON seems to verify against the schema now. We now provide several different JSON sets:
Thank you all and @bagder for this! The JSON output looks pretty great now :) And we'll soon work on ingesting this into osv.dev to make this useful to all OSV users.
The very last requests here are to:
https://curl.se/docs/vuln/<ID>.json
to make individual entries easier to refer to.Point 2 is also helpful for our own immediate osv.dev ingestion as well, since right now we only support ingesting OSV from git repos, or GCS buckets. We have an in-process FR to support ingesting from HTTP endpoints as well (such as your current feed), but that may take a little bit of time.
Add a ID specific endpoint
Already done. Here's an example: https://curl.se/docs/CVE-2022-35252.json (there are links to these from the corresponding html version on the web)
We also have curl version specific endpoints. Example: https://curl.se/docs/vuln-7.88.1.json - which contains all the CVEs that affect that specific curl release.
Possibly publish this in a git repo as well?
Everything that ends up in the JSON objects is data that is already present in git.
The JSON files themselves are generated with a script that is also in git. The content for the JSON comes from each separate advisory markdown (example https://github.com/curl/curl-www/blob/master/docs/CVE-2021-22926.md) combined with the vuln.pm file (https://github.com/curl/curl-www/blob/master/docs/vuln.pm).
The JSON files themselves are not in git because it seems much better to do it this way. Now we can easily regenerate them whenever we need to. If we want to update the JSON objects or if any metadata changes.
Add a ID specific endpoint
Already done. Here's an example: https://curl.se/docs/CVE-2022-35252.json (there are links to these from the corresponding html version on the web)
Awesome!! Any way to get https://curl.se/docs/CURL-CVE-2022-35252.json to work as well? (to match the "id" field).
Everything that ends up in the JSON objects is data that is already present in git.
The JSON files themselves are not in git because it seems much better to do it this way. Now we can easily regenerate them whenever we need to. If we want to update the JSON objects or if any metadata changes.
Sure, that sounds reasonable.
Any way to get https://curl.se/docs/CURL-CVE-2022-35252.json to work as well? (to match the "id" field).
You are already making me regret faking that Id. No. That's not a real Id. That's a fake one we put there only because the schema doesn't allow us to use the only Id we have for the issue: the CVE Id. I don't want to add infra around this fake Id to make it seem like its real.
I made the "URL" key in the JSON use the URL to the CVE specific JSON instead of the HTML one like I had before. I think the official schema should have a URL field like this.
Any way to get https://curl.se/docs/CURL-CVE-2022-35252.json to work as well? (to match the "id" field).
You are already making me regret faking that Id. No. That's not a real Id. That's a fake one we put there only because the schema doesn't allow us to use the only Id we have for the issue: the CVE Id. I don't want to add infra around this fake Id to make it seem like its real.
It's sadly necessary when there are several vulnerability databases that export the same IDs. We need a way to disambiguate them. e.g. https://osv.dev/vulnerability/DLA-3288-1 also refers to the same CVE, but we know this is the Debian specific one because of the DLA ID.
For the sake of consistency, https://curl.se/docs/CURL-CVE-2022-35252.json would make it easier for users to discover from a given OSV ID in an automated way. What if https://curl.se/docs/CURL-CVE-2022-35252.json redirected to https://curl.se/docs/CVE-2022-35252.json instead?
But what would make anyone try accessing https://curl.se/docs/$curl-cve-id.json
in the first place and assume it works?
But what would make anyone try accessing
https://curl.se/docs/$curl-cve-id.json
in the first place and assume it works?
A few places I can think of:
From the "Source" links provided by https://osv.dev: https://osv.dev/vulnerability/DLA-3288-1. We directly just append the OSV ID to a base URL to generate these.
We'd also like to add Curl to the list at https://ossf.github.io/osv-schema/#id-modified-fields, which also enumerates the "Source URL" to access an entry by ID.
Okay. It would be way better if you allowed the issues to provide working URL themselves instead of assuming how you can request them. But I've made https://curl.se/docs/CURL-$cve-id.json
work now, and they will redirect to https://curl.se/docs/$cve-id.json
.
Thank you for understanding! I'll close this for now, since I don't think there's any other changes needed on your end here.
Hey! I'm from the OSV (and OSS-Fuzz :) ) team. Really excited to see https://github.com/curl/curl-www/pull/237.
It looks like the current export is not spec-compliant, and we'd like to help work together on making this more compliant and get this integrated into https://osv.dev.
We have a JSON schema available that performs some of this validation that you can run against the current entry, which gives:
Additionally, there are some issues where our JSON schema current fails to validate:
the "id" field is the CVE ID. Is there a CURL specific ID instead that we can use to disambiguate this from other OSV sources? The "CVE" id can go into the "aliases" field to help link it with other sources in OSV that have the same CVE. A cheap way to do this is to just prepend "CURL-" to the CVE ID or some variation of that.
Only one of "last_affected", or "fixed" should be specified, but not both.
Generally for fields that don't fit into OSV, you can also use
database_specific
: https://ossf.github.io/osv-schema/#database_specific-field to add your own fields that don't fit into the main OSV fields.CC @dfandrich who also raised the issue on the
package
naming in https://github.com/ossf/osv-schema/issues/94. Hopefully we can reach a resolution on that as well, (e.g. by defining a "Curl" ecosystem).