Use If-Modified-Since & Content-MD5 HTTP headers to examine remote dependencies

In thinking about using biomake to build a continuous-integration data aggregation pipeline (@cmungall), I'm imaging a model for database adapters that basically gives you a macro of the form:

BIOMAKE_CURL(downloaded_file_path,url_of_file)

At a crude level this can just be imagined as expanding into the following Makefile recipe (indeed, one could retain legacy compatibility if using this with GNU Make by defining BIOMAKE_CURL to expand into something like this):

downloaded_file_path:
    curl --output $@ url_of_file

However, behind the scenes, biomake will attempt to propagate the dependency check across the connection, either by

using the HTTP If-Modified-Since header with a GET method to check if the remote file has a later modification time than the local copy, or
(if running with --md5-hash) using a HEAD method to retrieve the HTTP header, and comparing the Content-MD5 header field to the locally stored MD5 hash.

The Content-MD5 idea is a bit dicey because it may not be well-supported (e.g. Apache can do it but only by computing the MD5 hash every time; it doesn't cache it). We could pretty easily whip up a node-express plugin that would cache the hash, I expect.

evoldoers / biomake

Use If-Modified-Since & Content-MD5 HTTP headers to examine remote dependencies #62