gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
73.44k stars 7.37k forks source link

Shorter fingerprinting #6241

Open XhmikosR opened 4 years ago

XhmikosR commented 4 years ago

This isn't super important, but it feels like having a 96 (sha384) or 64 (sha256) character hash is too much.

Would it be possible to offer an option to limit this, say to 7 characters? Assuming the algorithm is the same, this shouldn't result in any collisions I think?

I could try doing this myself, but I thought I'd ask.

Thanks!

bep commented 4 years ago

this shouldn't result in any collisions I think?

If that was true, why are they so long in the first place?

So, I agree that for certain uses of this it would be fine (cache-busting), so we should add a length or something option to truncate it.

XhmikosR commented 4 years ago

I meant for this specific use case :)

Thanks for considering this.

davidsneighbour commented 4 years ago

If this is about Subresource Integrity hashes please keep in mind, that the standard for the JS and CSS links expects SHA384 for the integrity hashes.

See https://www.w3.org/TR/SRI/

"Cache busting" is a new use for me ;) I always used date strings or random strings for that.

bep commented 4 years ago

Yes, we're not messing with the SRI hashes.

RalphCorderoy commented 3 years ago

If this is about Subresource Integrity hashes please keep in mind, that the standard for the JS and CSS links expects SHA384 for the integrity hashes.

I took that as SHA384 is the only digest allowed. In case others think the same, the relevant bits of the specification are:

Conformant user agents MUST support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a request’s integrity metadata and MAY support additional hash functions.

At the time of writing, SHA-384 is a good baseline.

RalphCorderoy commented 3 years ago

Please consider the case where cache-busting is not required but the integrity hash is still wanted, i.e. a length of 0 but without style..css being generated.

bep commented 3 years ago

@RalphCorderoy you can do like this:

{{ $fingerprinted := $css | fingerprint }}
<style src="{{ $css.RelPermalink }}" integrity="$fingerprinted.Data.Integrity" }}
karolyi commented 1 year ago

voting for this for cache-busting

jmooring commented 1 year ago

With the recent introduction of resources.Copy and crypto.FNV32a you can do:

{{ with resources.Get "sass/main.scss" }}
  {{ with $css := . | toCSS | minify | fingerprint }}
    {{ with resources.Copy (printf "css/style.%d.min.css" (crypto.FNV32a .Content)) . }}
      <link rel="stylesheet" href="{{ .RelPermalink }}" integrity="{{ $css.Data.Integrity }}" crossorigin="anonymous">
    {{ end }}
  {{ end }}
{{ end }}

Which renders to something like:

<link rel="stylesheet" href="/css/style.87871251.min.css" integrity="sha256-FcQqt3aNlV7AZnGV4zkQRVeCeJOxbMPnQSx258L803E=" crossorigin="anonymous">
bep commented 1 year ago

{{ with resources.Copy (printf "css/style.%d.min.css" (crypto.FNV32a .Content)) . }}

Yea, that works, but you end up recalculating that same hash over and over again -- a common thing with these resources is that they're included in "every" page.

jmooring commented 1 year ago

Yeah, I guess partialCached only gets you most of the way there.

bep commented 1 year ago

Yeah, I guess partialCached only gets you most of the way there.

Yea well, most of the way is good enough in this scenario, I guess. I have a fix for partialCached in the pipeline to make it only "run once" for a given key.

Also, the FNV32a is very fast.

karolyi commented 1 year ago

Or maybe just taking the first X characters from the original Data.Integrity, something along these lines?

https://embeddedartistry.com/blog/2017/07/05/printf-a-limited-number-of-characters-from-a-string/

jmooring commented 1 year ago

Using the un-minified bootstrap.css as the resource, with 100 pages, with a cold cache for each test...

With FNV32a cache buster as described above:

image

Without the cache buster:

image

The difference is noise.

most of the way is good enough in this scenario

I agree.

jmooring commented 1 year ago

So, the resources.Copy approach fails if you PostProcess the resource prior to copying. Which I think makes sense, but makes my head hurt.

bep commented 1 year ago

So, the resources.Copy approach fails if you PostProcess the resource prior to copying. Which I think makes sense, but makes my head hurt.

Yea, that's definitively not supported with how PostProcess currently works. But Isn't this a ... vanity issue?

sidharthv96 commented 1 year ago

Is there a way to use the first few characters of the hash for the Filename, not SRI. Cloudflare will not deploy the site as URLs are > 100 chars long. Even with Sha256, main.css will be 86 chars long. So, we're forced to use MD5 hash, hence affecting security.

"Error: Failed to publish your Function. Got error: Error 8000057: Rule (%s) in routes.json is over the 100 character limit. Refer to https://cfl.re/3FsE4aF."

If we could truncate the filename hash to 10 chars or so, it should be enough for cachebusting, while allowing us to use more secure SRI Hashes.

sidharthv96 commented 1 year ago

https://github.com/gohugoio/hugo/blob/9af78d11003ed22b6281e47e1ac19568cf1cca81/resources/resource_transformers/integrity/integrity.go#L79

I suppose truncating this should suffice. Happy to raise a PR if the solution is okay.

akbyrd commented 3 months ago

Adding context from my use case and an issue with current workarounds when typescript resources are renamed by js.Build.

I have a typescript file I want to compile to javascript, minify, add an SRI hash, and cache-bust with a reasonably short filename.

Given an input like main.ts, I'm looking to get this output

<script src="/main-G7Cf3v.js" integrity="sha256-..." crossorigin="anonymous"></script>

js.Build will rename the file from main.ts to main.js and will update .RelPermalink and .Permalink, but not .Name. This means I don't have a way to get the final extension without also publishing the resource. The strategy I'm currently using below mostly works, but it will end up with src="/main-G7Cf3v.ts" with .ts instead of .js.

{{ with resources.Get "main.ts" }}
    {{ $resource := . | js.Build (dict "minify" true) | fingerprint }}

    {{ $fingerprint = $resource.Data.Integrity }}
    {{ $hash := index (index (findRESubmatch `sha\d*-(.*)` $fingerprint) 0) 1 }}
    {{ $shortHash := $hash | first 6 }}
    {{ $shortHash = replace $shortHash "+" "p" }}
    {{ $shortHash = replace $shortHash "/" "f" }}
    {{ $shortHash = replace $shortHash "=" "e" }}

    {{/* $resource.Name is still main.ts instead of main.js */}}
    {{/* $resource.RelPermalink will cause a second copy to be published */}}

    {{ $newName := replaceRE `\.([^.]*\z)` (printf "-%s.$1" $shortHash) $resource.Name }}
    {{ $resource = $resource.Content | resources.FromString $newName }}

    <script src="{{ $resource.RelPermalink }}" integrity="{{ $fingerprint }" crossorigin="anonymous"></script>
{{ end }}

I can think of a few possible solutions:

  1. Add an option to fingerprint to limit the number of hash characters used to rename the file for cache-busting purposes.
  2. Add a method to move/rename resources. This would effectively allow .RelPermalink to be mutated.
  3. Add way way to get a resource's output name without publishing it.

I'm still new to hugo so hopefully I haven't gotten any of the details wrong.

jmooring commented 3 months ago

@akbyrd https://gohugo.io/functions/resources/copy/

akbyrd commented 3 months ago

I don't think that provides a solution to the filename issue (though I imagine it's more efficient than using resources.FromString?).

There's no way to get the proper name of the original resource (with .js) without publishing it.

jmooring commented 3 months ago

I just tested resources.Copy with a fingerprinted file, and it retains the hash in the name. That is, at least for me, unexpected. For example:

{{ resources.Copy "js/app.js" . }} → app.f85a8953420a757cf53434fead706fb0d2864bd93c5f7294fef74d84cfbc20ce.js

So, that's not terribly helpful.

Since you know you're building a js file, is there a problem with hardcoding the extension?

Also, if you can live with a 10 character hash, pass $fingerprint through https://gohugo.io/functions/crypto/fnv32a/.

akbyrd commented 3 months ago

I have a partial template that I use for all my resources. It isn't specific to javascript. It also wouldn't be possible to share with others without hard-coding all possible extension changes, or adding a caveat that it won't work for anything other than ts -> js.

I don't believe the hash persisted through a copy when I tried it, but I could be misremembering.

akbyrd commented 3 months ago

Also, if you can live with a 10 character hash, pass $fingerprint through https://gohugo.io/functions/crypto/fnv32a/.

I believe this still has the same issue: when I name the file, how do I get the correct extension without publishing the resource with the fingerprint in the name? Or hard-coding the extension so it only works with assets types I explicitly handle?

jmooring commented 3 months ago

My point with crypto.FNV32a was that you can eliminate 5 lines of code. This has nothing to do with the extension.

akbyrd commented 3 months ago

Ah gotcha. Thanks for the tip.

jmooring commented 3 months ago

Also see https://github.com/gohugoio/hugo/issues/12143 for access to the extension. I'm also going to log an issue against resources.Copy, which may be rejected, but it seems to me if I say "copy to foo" it should copy to "foo".

jmooring commented 3 months ago

Finally, there's an intentionally undocumented method .Key which will give you the extension you want, but there is no API promise on this. This is a cache key that just happens to include the extension, but you might be able to use it until the key format changes.

This seems to work OK with v0.124.0:

{{ $assetPath := "ts/main.ts" }}
{{ with resources.Get $assetPath }}
  {{ $basename := path.BaseName .Name }}
  {{ with . | js.Build | fingerprint }}
    {{ $ext := path.Ext .Key }}
    {{ $hash := crypto.FNV32a .Data.Integrity }}
    {{ with resources.Copy (printf "%s-%d%s" $basename $hash $ext) . }}
      <script src="{{ .RelPermalink }}" integrity="{{ .Data.Integrity }}" crossorigin="anonymous"></script>
    {{ end }}
  {{ end }}
{{ else }}
  {{ errorf "Unable to get global asset %q" $assetPath }}
{{ end }}

This produces something like main-4229906572.js in the root of the public directory.

Use with caution. We may change the value returned by .Key at any time.

@akbyrd

Seirdy commented 3 months ago

I use a quick crypto.FNV32a-based fix for short cache-busting fingerprints that doesn’t directly rely on the unstable .Key method.

I use Hugo’s crypto.FNV32a to generate a short hash, then copy the resource to a new path with that fingerprint.
{{ $resource := resources.Get . -}}
{{- $target_path_formatStr := (replaceRE `(\.[^\.]*)$` ".%d$1" .) -}}
{{- $cacheBuster := $resource.Content | crypto.FNV32a -}}
{{- $target_path := printf $target_path_formatStr $cacheBuster -}}
{{- return resources.Copy $target_path $resource -}}
You can see it used in my site’s head element. I invoke it using partialCached so the fingerprinting only happens once per resource:
{{ $icon_svg := partialCached "cache-bust.html" "/favicon.svg" "/favicon.svg" }}
{{- printf `<link rel="icon" sizes="any" href="%s" type="image/svg+xml" />` $icon_svg.RelPermalink | safeHTML }}
Here’s a snippet of the final rendered result:
```html ```

Encoding it to a higher base and using alphanumerics could shave off 1-2 ch.


Originally posted on seirdy.one: See Original.

jmooring commented 2 months ago

@Seirdy With this approach you don't have an SRI hash. If you don't fingerprint, the resource's .Data.Integrity is nil. This issue is about short fingerprints in conjunction with fingerprinting.