ZOSOpenTools / meta

Meta repository to tie together the various underlying z/OS Open Source tools repositories here
https://zosopentools.github.io/meta/
Apache License 2.0
37 stars 25 forks source link

Move from tarball releases to git repos for all packages in order to have commit hashes #772

Open KeplerBoyce opened 1 month ago

KeplerBoyce commented 1 month ago

We are unable to get commit hashes for packages built from tarball releases, which currently prevents the create_cve_json python script from #765 from querying the osv.dev database for vulnerabilities in those packages. Unless there is a way to get commit hashes from tarball releases or a general way in which we can query the osv.dev database without using commit hashes, we should move towards building from git repos for all packages rather than tarball releases.

KeplerBoyce commented 1 month ago

It looks like the query endpoint on the osv.dev api does also allow for querying by package name and version, which may be an option in the create_cve_json script for packages where we don't have the commit hash, or just instead of using commit hashes altogether. Here's an example from the docs:

curl -d \
  '{"package": {"name": "mruby"}, "version": "2.1.2rc"}' \
  "https://api.osv.dev/v1/query"

The docs say that querying by name also requires a package ecosystem field, but the example they gave doesn't do this, so maybe that isn't actually the case? I tried some requests using a few different packages with vulnerabilities and it seems to work fine with only the name and version.

I think if we did it this way we would also need to modify the script that creates the releases json cache a little bit to include version numbers for releases. (We could also get this from the file name in the release assets, but I'm not sure if that would be completely reliable?)

IgorTodorovskiIBM commented 1 month ago

It looks like the query endpoint on the osv.dev api does also allow for querying by package name and version, which may be an option in the create_cve_json script for packages where we don't have the commit hash, or just instead of using commit hashes altogether. Here's an example from the docs:

curl -d \
  '{"package": {"name": "mruby"}, "version": "2.1.2rc"}' \
  "https://api.osv.dev/v1/query"

The docs say that querying by name also requires a package ecosystem field, but the example they gave doesn't do this, so maybe that isn't actually the case? I tried some requests using a few different packages with vulnerabilities and it seems to work fine with only the name and version.

I think if we did it this way we would also need to modify the script that creates the releases json cache a little bit to include version numbers for releases. (We could also get this from the file name in the release assets, but I'm not sure if that would be completely reliable?)

I recall have trouble with the name/version query API before. For example:

 curl -d \
  '{"commit": "3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0"}' \
  "https://api.osv.dev/v1/query"

from the 2.45.0 build returns something , but:

curl -d \
  '{"package": {"name": "git"}, "version": "2.45.0"}' \
  "https://api.osv.dev/v1/query"

does not. Which name/version queries forked for you? It could be a good fallback approach when the community commit sha is not available.

That version info is already in the metadata.json for each release, so we could add that into the zopen_releases.json.

KeplerBoyce commented 1 month ago

Hmm, you're right actually. I tried some more queries and using name and version seems to return nothing for a lot of package releases that the commit hash query does return vulnerabilities for (like your example with git 2.45.0). I think I just got lucky with the arbitrary packages/versions that I tested, which all respond with vulnerabilities for both queries:

I took a closer look, though, and the responses are actually a little bit different between name/version and commit hash queries for all three of those packages. Harfbuzz responds with several more vulnerabilities for the commit hash query, and redis and npm respond with several more for the name/version query. Maybe we could do both queries for every package and combine the results?

It seems like we would still ideally like to have the community commit sha for every package, as we would likely miss a lot of vulnerabilities using only name/version.