OSV spec - Githubissues

oliverchang commented 1 year ago

Hey! I'm from the OSV (and OSS-Fuzz :) ) team. Really excited to see https://github.com/curl/curl-www/pull/237.

It looks like the current export is not spec-compliant, and we'd like to help work together on making this more compliant and get this integrated into https://osv.dev.

We have a JSON schema available that performs some of this validation that you can run against the current entry, which gives:

2023-05-02T13:52:45+02:00Z: '2023-05-02T13:52:45+02:00Z' does not match '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\\.[0-9]+)?Z'
2023-03-20: '2023-03-20' does not match '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\\.[0-9]+)?Z'
basic: 'basic' is not one of ['CVSS_V2', 'CVSS_V3']
1.0: '1.0' does not match 'CVSS:3\\.[0-9]/AV:[NALP]\\/AC:[LH]\\/PR:[NLH]\\/UI:[NR]\\/S:[UC]\\/C:[NLH]\\/I:[NLH]\\/A:[NLH]'
{'name': 'curl', 'purl': 'pkg:generic/curl'}: 'ecosystem' is a required property
{'last': '7.88.1'}: {'last': '7.88.1'} is not valid under any of the given schemas
{'id': 'CVE-2023-27538', 'summary': 'SSH connection too eager reuse still', 'URL': 'https://curl.se/docs/CVE-2023-27538.html', 'modified': '2023-05-02T13:52:45+02:00Z', 'CWE': 'CWE-305: Authentication Bypass by Primary Weakness', 'published': '2023-03-20', 'affected': [{'package': {'name': 'curl', 'purl': 'pkg:generic/curl'}, 'ranges': [{'type': 'SEMVER', 'events': [{'introduced': '7.16.1'}, {'last': '7.88.1'}, {'fixed': '8.0.0'}]}], 'versions': ['7.88.1', '7.88.0', '7.87.0', '7.86.0', '7.85.0', '7.84.0', '7.83.1', '7.83.0', '7.82.0', '7.81.0', '7.80.0', '7.79.1', '7.79.0', '7.78.0', '7.77.0', '7.76.1', '7.76.0', '7.75.0', '7.74.0', '7.73.0', '7.72.0', '7.71.1', '7.71.0', '7.70.0', '7.69.1', '7.69.0', '7.68.0', '7.67.0', '7.66.0', '7.65.3', '7.65.2', '7.65.1', '7.65.0', '7.64.1', '7.64.0', '7.63.0', '7.62.0', '7.61.1', '7.61.0', '7.60.0', '7.59.0', '7.58.0', '7.57.0', '7.56.1', '7.56.0', '7.55.1', '7.55.0', '7.54.1', '7.54.0', '7.53.1', '7.53.0', '7.52.1', '7.52.0', '7.51.0', '7.50.3', '7.50.2', '7.50.1', '7.50.0', '7.49.1', '7.49.0', '7.48.0', '7.47.1', '7.47.0', '7.46.0', '7.45.0', '7.44.0', '7.43.0', '7.42.1', '7.42.0', '7.41.0', '7.40.0', '7.39.0', '7.38.0', '7.37.1', '7.37.0', '7.36.0', '7.35.0', '7.34.0', '7.33.0', '7.32.0', '7.31.0', '7.30.0', '7.29.0', '7.28.1', '7.28.0', '7.27.0', '7.26.0', '7.25.0', '7.24.0', '7.23.1', '7.23.0', '7.22.0', '7.21.7', '7.21.6', '7.21.5', '7.21.4', '7.21.3', '7.21.2', '7.21.1', '7.21.0', '7.20.1', '7.20.0', '7.19.7', '7.19.6', '7.19.5', '7.19.4', '7.19.3', '7.19.2', '7.19.1', '7.19.0', '7.18.2', '7.18.1', '7.18.0', '7.17.1', '7.17.0', '7.16.4', '7.16.3', '7.16.2', '7.16.1']}], 'severity': [{'type': 'basic', 'score': 'Low'}, {'type': 'CVSS_V3', 'score': '1.0'}], 'credits': [{'name': 'Harry Sintonen', 'type': 'FINDER'}, {'name': 'Daniel Stenberg', 'type': 'REMEDIATION_DEVELOPER'}], 'details': 'libcurl would reuse a previously created connection even when an SSH related\noption had been changed that should have prohibited reuse.\n\nlibcurl keeps previously used connections in a connection pool for subsequent\ntransfers to reuse if one of them matches the setup. However, two SSH settings\nwere left out from the configuration match checks, making them match too\neasily.'}: Additional properties are not allowed ('CWE', 'URL' were unexpected)

Additionally, there are some issues where our JSON schema current fails to validate:

the "id" field is the CVE ID. Is there a CURL specific ID instead that we can use to disambiguate this from other OSV sources? The "CVE" id can go into the "aliases" field to help link it with other sources in OSV that have the same CVE. A cheap way to do this is to just prepend "CURL-" to the CVE ID or some variation of that.
Only one of "last_affected", or "fixed" should be specified, but not both.

Generally for fields that don't fit into OSV, you can also use database_specific: https://ossf.github.io/osv-schema/#database_specific-field to add your own fields that don't fit into the main OSV fields.

CC @dfandrich who also raised the issue on the package naming in https://github.com/ossf/osv-schema/issues/94. Hopefully we can reach a resolution on that as well, (e.g. by defining a "Curl" ecosystem).

oliverchang commented 1 year ago

Also, if there are fix patch information available, it would be super useful to users if we provide those as well via OSV's git ranges. e.g. in OSV/OSS-Fuzz we already have this: https://github.com/google/oss-fuzz-vulns/blob/53ad4dcef49b6c5f1c26480832011328e472b7c3/vulns/curl/OSV-2022-450.yaml#L28

This would allow users who are using curl as a library (and pulling it in as a e.g. a submodule) to automatically make use of curl advisories by just checking git trees.

bagder commented 1 year ago

Thanks for the excellent feedback!

We have a JSON schema available that performs some of this validation

Being a JSON rookie, what's the easiest way to verify this json against that schema?

The CVSS

We fake the score based on the named severity level (which is the only severity we provide), but I cannot fake a whole vector string. I don't think we can comply with this. I think it's a weakness in the schema.

CVSS is a poor system and we don't play long in that game.

We only have one of Low, Medium, High or Critical as level.. How do we convey that in the metadata? Should we just leave out severity and provide it as database_specific? Feels a bit lame.

Only one of "last_affected", or "fixed" should be specified, but not both.

Aren't both informative? (I actually didn't see "last_affected" before which is why it uses the wrong name)

by defining a "Curl" ecosystem

I hope you pick another solution. We are not an ecosystem. This problem is not unique to us and it would be a pity if others in our situation would have to claim they too are ecosystems.

if there are fix patch information available, it would be super useful to users if we provide those as well via OSV's git ranges

Neat. That's indeed a cool idea. I think we have this information for most/many issues but it is going to take some work for us to convert that into structured data that we can pull out and insert into JSON. I will work on it.

I made #241 to fix some of the easy things first.

bagder commented 1 year ago

Neat. That's indeed a cool idea. I think we have this information for most/many issues but it is going to take some work for us to convert that into structured data that we can pull out and insert into JSON. I will work on it.

Thinking further, I think I won't. We already provide exactly every single vulnerable version and it is easy for any git user to figure out if they are in a vulnerable range or not.

oliverchang commented 1 year ago

Thanks for the excellent feedback!

We have a JSON schema available that performs some of this validation

Being a JSON rookie, what's the easiest way to verify this json against that schema?

Check out the README here https://github.com/ossf/osv-schema/tree/main/validation :)

The CVSS

We fake the score based on the named severity level (which is the only severity we provide), but I cannot fake a whole vector string. I don't think we can comply with this. I think it's a weakness in the schema.

CVSS is a poor system and we don't play long in that game.

We only have one of Low, Medium, High or Critical as level.. How do we convey that in the metadata? Should we just leave out severity and provide it as database_specific? Feels a bit lame.

database_specific is the only option here for now. The reason why we don't support low/medium/high/critical is because they don't allow for consistency across different sources. Every database tends to have its own qualitative scale for these.

Only one of "last_affected", or "fixed" should be specified, but not both.

Aren't both informative? (I actually didn't see "last_affected" before which is why it uses the wrong name)

We see "fixed" as being the most informative here. last_affected is redundant in such cases because we have the versions array and anything before "fixed" is irrelevant.

by defining a "Curl" ecosystem

I hope you pick another solution. We are not an ecosystem. This problem is not unique to us and it would be a pity if others in our situation would have to claim they too are ecosystems.

Indeed it's not ideal, but the best solution we've come up with so far is to use repository URLs to achieve this. Most open source projects can leverage this "namespace/ecosystem" of repository URLs to identify themselves.

I do understand that repository URLs aren't totally accurate for curl, but you still be agreeable to using that as an identifier?

Thinking further, I think I won't. We already provide exactly every single vulnerable version and it is easy for any git user to figure out if they are in a vulnerable range or not.

Could I convince you otherwise? :) It's not trivial for git users to map the versions back to git tags/commits in an automated way. Also if the git metadata is provided explicitly, it enables our https://osv.dev API to index it and allow for queries such as:

curl -d \
  '{"commit": "6879efc2c1596d11a6a6ad296f80063b558d5e0f"}' \
  "https://api.osv.dev/v1/query"

To return any curl results that match.

Thanks!

rhalar commented 1 year ago

We use vulnerability data and I can confirm that the low/medium/high/critical scales differ wildly between different vendors, the database_specific field really is the best place for it especially if it has some information on which database/vendor made the call on the score.

And regarding the last_affected information, it does get stale quite fast and is only perhaps useful up until the fixed version comes along. And even then if new, still affected, versions come out it even becomes 'wrong' in a way, and could trip someone up as if they are affected or not. The only information it really provides is when the vulnerability was noticed, as far as I can tell.

I also do hope it doesn't come to defining ecosystems for libraries, the repository URL solution seems like a better one, I do agree with @oliverchang that it all comes back to the source.

bagder commented 1 year ago

severity

We use vulnerability data and I can confirm that the low/medium/high/critical scales differ wildly between different vendors

Maybe, but the CVSS score is in no way better. NVD for example re-scores everything anyway so whatever we provide, they modify it anyway. The CVSS is also too one-dimensional and leaves out a lot of factors which tends to make the score higher than they would be when taken a larger take. Which is part of the reason why we in the curl project opt to not play the CVSS game.

There simply is no objective way to establish the severity of an issue. There are but opinions of different parties.

compliance

By not providing any info at all about severity or package the json object complies with the schema... Not sure how that makes the JSON better, but at least there is no complaints from the tool.

git ranges

Could I convince you otherwise?

Perhaps. It's a lot of work, and there are now several pieces of metadata we cannot provide anyway in the JSON object so the question is how much value we add by adding data to a partially supported JSON object?

Like this:

curl -d \ '{"commit": "6879efc2c1596d11a6a6ad296f80063b558d5e0f"}' \ "https://api.osv.dev/v1/query"

Doesn't that also need to tell which package it asks about? And you cannot ask about curl because it has no ecosystem so that field cannot be populated: so this command line cannot be run anyway.

bagder commented 1 year ago

I've started to update the advisories (and thus the json objects) with specific git commit info: introduced and fixed.

rhalar commented 1 year ago

Doesn't that also need to tell which package it asks about? And you cannot ask about curl because it has no ecosystem so that field cannot be populated: so this command line cannot be run anyway.

The ecosystem field is required in the package object, but the package object is not required. You can have only the ranges, as far as I'm aware, and the API takes just the commit hash.

bagder commented 1 year ago

the API takes just the commit hash

How does the API know which project/git repo to check then?

bagder commented 1 year ago

I have now gone through and provided git range information to what I believe is all curl CVEs filed since 2017 (current count: 85). Those advisories have been updated and the JSON objects are now populated automatically with metadata from those.

dfandrich commented 1 year ago

How does the API know which project/git repo to check then?

The spec says the git commit hashes must be full-length, which makes them unique across all projects, making the project or git repo unnecessary. I notice that the ones in the curl JSON are truncated.

bagder commented 1 year ago

which makes them unique across all projects

So it then scans through all projects to find a match? Didn't expect that.

The spec says the git commit hashes must be full-length

It sounds like the schema needs to be updated because it does not object to shortened hashes...

bagder commented 1 year ago

I made the script convert all hashes to the long format now.

bagder commented 1 year ago

that turned out to be a good idea, since that also verifies that the hash is correctly entered so it helped me find a few mistakes!

dfandrich commented 1 year ago

Are the old curl advisory IDs easily accessible (e.g. adv_20150108A)? Those are listed in places under "vendor advisory" so would be good to have in the JSON as an alias so they can be found that way.

bagder commented 1 year ago

We never used them as actual IDs in the project, we used them as URLs. We still have working redirects for those old URLs to end up on the appropriate current page.

Is it really a good idea to upgrade those to "official IDs" now when we haven't used them for the last five years? Surely the only users of those are using the URLs?

They could be resurrected from git, but then we would also need to invent a way to store it in the advisories and get it into the JSON.

dfandrich commented 1 year ago

I suspected it might not be easy any longer. It's probably not worth the effort then.

bagder commented 1 year ago

Are there any more details left to fix now? The JSON seems to verify against the schema now. We now provide several different JSON sets:

as an array of objects for all issues
all CVEs for a specific release
as a single object for a specific CVE

oliverchang commented 1 year ago

Thank you all and @bagder for this! The JSON output looks pretty great now :) And we'll soon work on ingesting this into osv.dev to make this useful to all OSV users.

The very last requests here are to:

Add a ID specific endpoint. e.g. something like https://curl.se/docs/vuln/<ID>.json to make individual entries easier to refer to.
Possibly publish this in a git repo as well? So that users can also open PRs to suggest edits to entries if necessary. This is what e.g. GitHub, PyPA does today to enable community contributions.

Point 2 is also helpful for our own immediate osv.dev ingestion as well, since right now we only support ingesting OSV from git repos, or GCS buckets. We have an in-process FR to support ingesting from HTTP endpoints as well (such as your current feed), but that may take a little bit of time.

bagder commented 1 year ago

Add a ID specific endpoint

Already done. Here's an example: https://curl.se/docs/CVE-2022-35252.json (there are links to these from the corresponding html version on the web)

We also have curl version specific endpoints. Example: https://curl.se/docs/vuln-7.88.1.json - which contains all the CVEs that affect that specific curl release.

Possibly publish this in a git repo as well?

Everything that ends up in the JSON objects is data that is already present in git.

The JSON files themselves are generated with a script that is also in git. The content for the JSON comes from each separate advisory markdown (example https://github.com/curl/curl-www/blob/master/docs/CVE-2021-22926.md) combined with the vuln.pm file (https://github.com/curl/curl-www/blob/master/docs/vuln.pm).

The JSON files themselves are not in git because it seems much better to do it this way. Now we can easily regenerate them whenever we need to. If we want to update the JSON objects or if any metadata changes.

oliverchang commented 1 year ago

Add a ID specific endpoint

Already done. Here's an example: https://curl.se/docs/CVE-2022-35252.json (there are links to these from the corresponding html version on the web)

Awesome!! Any way to get https://curl.se/docs/CURL-CVE-2022-35252.json to work as well? (to match the "id" field).

Everything that ends up in the JSON objects is data that is already present in git.

The JSON files themselves are not in git because it seems much better to do it this way. Now we can easily regenerate them whenever we need to. If we want to update the JSON objects or if any metadata changes.

Sure, that sounds reasonable.

bagder commented 1 year ago

Any way to get https://curl.se/docs/CURL-CVE-2022-35252.json to work as well? (to match the "id" field).

You are already making me regret faking that Id. No. That's not a real Id. That's a fake one we put there only because the schema doesn't allow us to use the only Id we have for the issue: the CVE Id. I don't want to add infra around this fake Id to make it seem like its real.

I made the "URL" key in the JSON use the URL to the CVE specific JSON instead of the HTML one like I had before. I think the official schema should have a URL field like this.

oliverchang commented 1 year ago

Any way to get https://curl.se/docs/CURL-CVE-2022-35252.json to work as well? (to match the "id" field).

You are already making me regret faking that Id. No. That's not a real Id. That's a fake one we put there only because the schema doesn't allow us to use the only Id we have for the issue: the CVE Id. I don't want to add infra around this fake Id to make it seem like its real.

It's sadly necessary when there are several vulnerability databases that export the same IDs. We need a way to disambiguate them. e.g. https://osv.dev/vulnerability/DLA-3288-1 also refers to the same CVE, but we know this is the Debian specific one because of the DLA ID.

For the sake of consistency, https://curl.se/docs/CURL-CVE-2022-35252.json would make it easier for users to discover from a given OSV ID in an automated way. What if https://curl.se/docs/CURL-CVE-2022-35252.json redirected to https://curl.se/docs/CVE-2022-35252.json instead?

bagder commented 1 year ago

But what would make anyone try accessing https://curl.se/docs/$curl-cve-id.json in the first place and assume it works?

oliverchang commented 1 year ago

But what would make anyone try accessing https://curl.se/docs/$curl-cve-id.json in the first place and assume it works?

A few places I can think of:

From the "Source" links provided by https://osv.dev: https://osv.dev/vulnerability/DLA-3288-1. We directly just append the OSV ID to a base URL to generate these.
We'd also like to add Curl to the list at https://ossf.github.io/osv-schema/#id-modified-fields, which also enumerates the "Source URL" to access an entry by ID.

bagder commented 1 year ago

Okay. It would be way better if you allowed the issues to provide working URL themselves instead of assuming how you can request them. But I've made https://curl.se/docs/CURL-$cve-id.json work now, and they will redirect to https://curl.se/docs/$cve-id.json.

oliverchang commented 1 year ago

Thank you for understanding! I'll close this for now, since I don't think there's any other changes needed on your end here.

curl / curl-www

OSV spec #240

severity

compliance

git ranges