golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.45k stars 17.59k forks source link

io/fs, net/http: define interface for automatic ETag serving #60940

Open oliverpool opened 1 year ago

oliverpool commented 1 year ago

Renewal of #43223

In the discussion of io/fs and embed, a few people asked for automatic serving of ETag headers for static content, using content hashes.

Here is a proposal which tries to address the concerns raised in #43223.

Accepted proposal

In io/fs, define

// HashFileInfo provides hashes of the file content in constant time.
type HashFileInfo interface {
    FileInfo
    // Hash returns content hashes of the file that uniquely
    // identifies the file contents.
    //
    // Hash must NOT compute any hash of the file during the call.
    // That is, it must run in time O(1) not O(length of file).
    // If no content hash is already available, Hash should
    // return nil rather than take the time to compute one.
    //
    // The order of the returned hashes must be constant (preferred hashes first).
    Hash() []Hash
}
// Hash indicates the hash of a given content.
type Hash struct {
    // Algorithm indicates the algorithm used. Implementations are encouraged
    // to use package-like name for interoperability with other systems
    // (lowercase, without dash: e.g. sha256, sha1, crc32)
    Algorithm string
    // Sum is the result of the hash, it should not be modified by the caller.
    Sum []byte
}

Then, in net/http.serveFile, serveFile calls Stat, and if the result implements HashFileInfo, it calls info.Hash. If that returns >=1 hashes, serveFile uses hash[0] as the Etag header, formatting it using Alg+":"+base64(Sum).

In package embed, the file type would add a Hash method and an assertion that it implements HashFileInfo. It would return a single hash with Algorithm “sha256”.


Original proposal (fs.File)

First, in io/fs, define

// A ContentHashFile is a file that can return hashes of its content in constant time.
type ContentHashFile interface {
    fs.File

    // ContentHash returns content hashes of the file that uniquely
    // identifies the file contents.
    // The returned hashes should be of the form algorithm-base64.
    // Implementations are encouraged to use sha256, sha384, or sha512
    // as the algorithms and RawStdEncoding as the base64 encoding,
    // for interoperability with other systems (e.g. Subresource Integrity).
    //
    // ContentHash must NOT compute any hash of the file during the call.
    // That is, it must run in time O(1) not O(length of file).
    // If no content hash is already available, ContentHash should
    // return nil rather than take the time to compute one.
    ContentHash() []string
}

Second, in net/http, when serving a File (in serveFile, right before serveContent for instance), if it implements ContentHashFile and the ContentHash method succeeds and is alphanumeric (no spaces, no Unicode, no symbols, to avoid any kind of header problems), use that result as the default ETag.

func setEtag(w http.ResponseWriter, file File) {
    if ch, ok := file.(fs.ContentHashFile); ok {
        if w.Header().Get("Etag") != "" {
            return
        }
        for _, h := range ch.ContentHash() {
            // TODO: skip the hash if unsuitable (space, unicode, symbol)
            // TODO: should the etag be weak or strong?
            w.Header().Set("Etag", `W/"`+h+`"`)
            break
        }
    }
}

Third, add the ContentHash method on http.ioFile file (as a proxy to the fs.File ContentHash method).

Fourth (probably out of scope for this proposal), add the ContentHash method on embed.FS files.

This proposal fixes the following objections:

The API as proposed does not let the caller request a particular implementation.

The caller will simply get all available implementations and can filter them out.

The API as proposed does not let the implementation say which hash it used.

The implementers are encouraged to indicate the algorithm used for each hash.

The API as proposed does not let the implementation return multiple hashes.

This one does.

what is expected to happen if the ContentHash returns an error before transport?

This implementation cannot return an error (the implementer choose to panic. Returning nil seems better suited).

Drop this proposal and let third-party code fill this need.

It is currently very cumbersome, since the middleware would need to open the file as well (which means having the exact same logic regarding URL cleanup as the http.FileServer). Here is my attempt: https://git.sr.ht/~oliverpool/exp/tree/main/item/httpetag/fileserver.go (even uglier, since I have use reflect to retrieve the underlying fs.File from the http.File).


Could a "github-collaborator" post a message in #43223 to notify the people who engaged in previous proposal of this updated proposal?

oliverpool commented 4 months ago

Is there anything that can be done to make some progress on this issue?

Is the current solution good enough? https://github.com/golang/go/issues/60940#issuecomment-2037469632

neild commented 4 months ago

It seems that cmd/internal/codesign is using notsha256 and inverting the result to produce an actually-sha256 hash.

This is disgusting, but it's precedent. Should we do the same for embedded file hashes? (We'd also need to increase the size of the stored hash, since it's only 128 bits right now.)

oliverpool commented 3 months ago

Should we do the same for embedded file hashes?

Who can take the responsibility to answer this question?

I would really appreciate if we could move this issue forward.

oliverpool commented 1 month ago

Friendly ping: should we rely on the fact that notsha256 is an inverted sha256, like cmd/internal/codesign already does?