curl / curl-fuzzer

Quality assurance testing for the curl project
MIT License
85 stars 30 forks source link

download_zlib: pick latest version automatically #62

Closed vszakats closed 1 year ago

vszakats commented 1 year ago

zlib recently started to delete the previous release upon a new one. This in turn breaks this script and the fuzzer that needs it.

This patch detects the latest version and downloads that automatically.

~This trades of this maintenance burden (and occasional days of CI breakage) with some complexity.~ (and the fact that zlib can change without notice under the hood, but that seems a net positive overall in this case.)

For extra protection it'd always be nice to verify the signature too, but that's for another day.

cmeister2 commented 1 year ago

Yeah I want to spend a little time seeing if there's a better solution here - I'll have a quick think this afternoon.

vszakats commented 1 year ago

Normally I do such parsing with html-xml-utils, but that's likely another step to install. (or pup, but that's not usually available from default repos.)

Here's the lines doing that (from curl-for-win): https://github.com/curl/curl-for-win/blob/1cce572e11372950d431250709f2ec5eb17cebd4/_bumper.sh#L6-L16 and: https://github.com/curl/curl-for-win/blob/1cce572e11372950d431250709f2ec5eb17cebd4/_dl.sh#L299-L300

cmeister2 commented 1 year ago

So I think we can achieve this with the following (going off the configure code in the zlib git mirror):

wget -O /tmp/zlib.h https://raw.githubusercontent.com/madler/zlib/master/zlib.h
ZLIB_VERSION=$(sed -n -e '/VERSION "/s/.*"\(.*\)".*/\1/p' /tmp/zlib.h)

Just uses normal wget; no need for extra packages.

vszakats commented 1 year ago

That could work, but it would depend on parsing two text files and the two zlib deploy targets to be in perfect sync.

We could query the GitHub API for the latest version and do the download from there, but that will require jq.

cmeister2 commented 1 year ago

That could work, but it would depend on parsing two text files and the two zlib deploy targets to be in perfect sync.

We could query the GitHub API for the latest version and do the download from there, but that will require jq.

Parsing two files? It's downloading one file (zlib.h) and parsing that one file - it's no different to parsing a webpage.

https://api.github.com/repos/madler/zlib/releases/latest does seem more stable as an option; and yes, it will need jq or some judicious abuse of sed to extract the necessary details from the json.

vszakats commented 1 year ago

OK, with some trial and error I've found out the permalink for the always-latest zlib source code. No parsing necessary.

vszakats commented 1 year ago

[ @cmeister2: You're right, sorry, it's parsing HTML vs source code. Not double parsing. ]

cmeister2 commented 1 year ago

OK, with some trial and error I've found out the permalink for the always-latest zlib source code. No parsing necessary.

I'm a little dubious of this interface... I can't find any links to, or documentation for it at all. But I guess if it works it works... Is it definitely the latest version?

vszakats commented 1 year ago

It's the latest now:

$ curl -s https://zlib.net/zlib.tar.gz | tar t
zlib-1.2.13/
zlib-1.2.13/zutil.h
zlib-1.2.13/inftrees.h
[...]
vszakats commented 1 year ago

Changed www.zlib.netzlib.net, to use the canonical domain.

dfandrich commented 1 year ago

Looks like you have a solution already, but another option would be to use https://glare.now.sh/madler/zlib/gz which is a third-party site that redirects to the latest tarball in a Github project. You'd really want to check signatures in that case because it's going through a non-Github site.

cmeister2 commented 1 year ago

If I was being super strict we should be vendoring our build dependencies so we're not as dependent on build-time downloads.

There's a few different options there, might start a wiki page for discussion.

vszakats commented 1 year ago

Another option is to set one or more fallback mirrors (e.g. GitHub) if the canonical one fails. Verifying the download against a signature, or at least a known good hash would be even more important in that case. (These add some amount of moving parts of course.)

If the concern is the potential flakiness of downloads, setting retries and timeout may be a near zero cost option to try first.