gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
74.93k stars 7.47k forks source link

Add cachebursting for static resources #621

Closed sajal closed 6 years ago

sajal commented 9 years ago

Hi I am evaluating Hugo as a static site generator. I have a feature request. If my theme has <link rel="stylesheet" href="{{ .Site.BaseUrl }}/css/poole.css"> It gets rendered as <link rel="stylesheet" href="http://localhost:1313/css/poole.css">

I would like it to be rendered something like <link rel="stylesheet" href="http://localhost:1313/css/poole.css?1474196d530086a378d0dfe1ae06fa3a">

^ Such that some sort of hash is appended at the end of the url based on the file contents. This way if i edit the CSS and regenerate the site, the reference is made to a different url so CDNs and browsers know that the resource is different and fetch the correct one. It would be a pain to manage this manually for all images, javascripts, css.

Feature request: Some template function which can return a hash (md5, sha or even lastmodified timestamp) for a given static resource.

-Sajal

dunn commented 9 years ago

Sounds like {{ .Now.Unix }} might work—it just prints the current Unix time, but that should do well enough for what you need, right?

sajal commented 9 years ago

That might work... But its too aggressive. i.e. I would be asking users to refetch the css/assets whenever i change any content, irrespective of weather the CSS was changed or not.. I guess we can use .Now.Unix for now..

bep commented 9 years ago

I started writing this, then realize this (might) only work for Page (not Node - as in home page, categories etc.), but with 0.13-DEV you could use {{ .UniqueId }} This is a MD5 hash of the content. You could test it out; maybe @halostatue could chip in with info of Page vs. Node. If it's not on node, maybe it could somehow be added.

halostatue commented 9 years ago

It’s not on a Node. I think this may be something that, because Hugo knows when things have been updated for static file copying, it may need to provide this functionality.

sajal commented 9 years ago

I propose a template function .GetHash . to be used like {{ .GetHash "static/style.css" }} ... Html would look like :- <link rel="stylesheet" href="{{ .Site.BaseUrl }}/css/poole.css?{{ .GetHash "themes/something/static/css/poole.css" }}">

.GetHash could be MD5 or SHA1 or something...

halostatue commented 9 years ago

What about http://discuss.gohugo.io/t/hugo-link-helper-methods/334?

sajal commented 9 years ago

.Asset with file renaming is the most ideal solution, but you don't really need to change the filename. Simply appending querystring would do the trick.

halostatue commented 9 years ago

Unique filenames are safer and more resistant to aggressive intermediaries on the network. (It’s one of the things that I think that Sprockets gets right for the Rails Asset Pipeline.) I’ll see if I can figure out how this might be done.

jameslai commented 9 years ago

I love this idea and the plan to implement it within Hugo, however I see one notable challenge: assets referred in CSS. Sprockets addresses this issue by pre-processing CSS, and CSS is technically an ERB file. Their implementation looks like:

.class { background-image: url(<%= asset_path 'image.png' %>) }

Hugo at the moment doesn't leverage any kind of CSS pre-processing, so renamed images would slide out from underneath the CSS.

halostatue commented 9 years ago

You’re correct on this, and I don’t have a good answer to this particular problem without doing similar manipulations to Sprockets (and I don’t want to port Sprockets over to Go for this). The short answer may be that you need to manually version images, which will force the CSS to be updated and result in cache-busting for both.

We could also pre-process the CSS looking for url(partial-path) and do a simplistic regex-based substitution—but I’m not fond of that.

jameslai commented 9 years ago

Could we theoretically simply leverage Go templates? Swallow all the CSS and process them like a template on their way to the public assets directory, leveraging the same template function for other assets mentioned above?

.class { background-image: url({{ .GetHash 'image.png' }}) }

Or, perhaps to be a little more consistent with existing conventions:

.class { background-image: url({{ .AssetPath 'image.png' }}) }

halostatue commented 9 years ago

We certainly could do that—but then (1) your CSS isn’t actually static anymore and (2) your CSS also isn’t immediately usable on its own, which may or may not be a design goal. (Which is to say that I haven’t thought through this completely.)

I’m a little wary of doing that because it now means that you may have conflict with other CSS (or JS or…) preprocessors and minifiers. (It’d be nice if there were a protocol by which such things communicated renames, etc. similar to the Sprockets/Rails manifest. Maybe there is, but I don’t think so. Most of them think that they own the transformation completely.)

earthboundkid commented 8 years ago

In Django, assets are hashed by symlinking base.ext to base.HASH.ext, so that if something links to the original name, it will receive a result, if possibly a stale one. It also pre-processes CSS (but not JS by default under the assumption that JS can read the manifest file if it needs to). I don't see why Hugo couldn't do all of this. I have started working on a personal tool for file hashing. Should I work on a PR for Hugo itself?

bep commented 8 years ago

Should I work on a PR for Hugo itself?

That depends -- how would that PR look like?

earthboundkid commented 8 years ago

Ha, fair. Just trying to gauge interest before sitting down to do the work. The basic idea is to provide a template function that can translate plain paths into base_url + base_path + hash + ext, and then on build create symlinks and a manifest file of some sort.

sajal commented 8 years ago

I will be satisfied with a .GetHash template function where it returns a sha256 hash of the file passed as arguments, and managing the urls in my template manually. At this moment I don't care of links inside the CSS.

earthboundkid commented 8 years ago

I looked into implementing this today, but I ran into a minor setback. Is there a way to make a (sym)link in Afero? If not, I supposed I could just copy the files, but it's a bit inelegant.

earthboundkid commented 7 years ago

I have not worked on this since April, but I still think about the issue from time to time. Perhaps I approached it from the wrong angle. Instead of adding a function for cachebusting to Hugo, it might make more sense to have another tool consume the HTML + static assets of Hugo's public directory and create the cachebusted versions from that.

ISTM, Hugo is always going to be in the middle of a pipeline of some sort. You start with files that need pre-processing, like Sass and ES6+, and then you end up files that need post-processing (cachebusting, minification). It might be good to have Hugo handle the pipeline itself, a la https://github.com/spf13/hugo/issues/47, but it might make just as much sense to leave the managing of the pipeline to Make/Webpack/Godo.

vseventer commented 7 years ago

I struggled with the same issue as outlined here - I use Hugo for both my personal- and travel blog (it's awesome!), and got something working by adding a bunch of scripts to my package.json to do pre- and postprocessing.

Over the past week I have been working on a solution using Webpack. Shameless plug or not, but feel free to take a look at hugo-webpack-boilerplate. Its entry points are the files generated by Hugo, and it supports both pre- (SASS / ES6) and postprocessors (optimization, minification, tree-shaking).

Until Hugo has a pipeline of its own (not sure if it should, but that's another debate), I feel hugo-webpack-boilerplate, or any other similar project (e.g. victor-hugo which uses Gulp, or just some npm scripts) might be the next best thing.

robsonsobral commented 7 years ago

I believe that cache bursting is part of any static website. It isn't any thing super specific that only two developers will use. Even if the site is planned to be online just for some days, any bug can demand an edition. As a static site generator, Hugo should do the cache busting too.

sajal commented 7 years ago

Temporary solution (only for things reffered to from within templated html)

<link rel="stylesheet" href="/css/custom.css?{{ readFile "relative/path/to/css/custom.css" | md5}}" type="text/css">
earthboundkid commented 7 years ago

My current approach is

sandys commented 7 years ago

+1 to this. I just migrated from jekyll to hugo and jekyll has this built in as a top-level plugin https://github.com/jekyll/jekyll-assets

I believe this needs to be core Hugo - cache and assets are the single most important thing that speed up a website. Without this option in place the websites generated by hugo will be far inferior (out-of-the-box) than other tools - even if hugo itself is very fast.

There are already many third party golang libraries that can be used - http://libs.club/golang/web/assets

rdwatters commented 7 years ago

cache and assets are the single most important thing that speed up a website. Without this option in place the websites generated by hugo will be far inferior (out-of-the-box) than other tools - even if hugo itself is very fast.

This strikes me as an inaccurate (and oddly dramatic) statement.

Have you visited the Hugo forums to see how the community is currently keeping their sites fast? It may provide some insight as to how you can compensate for, how'd you put it, Hugo's inferiority?

yfilali commented 7 years ago

@rdwatters, what @sandys said is on point. Putting assets on a CDN with a long time to live is indeed a very important tool to speed up site download. Without a unique hash added to their urls, you have to rely on cache invalidation to expire the assets when they change. A hash like that is simple to generate as you can simply hash the file and use the hash as the filename with the proper extension. If the file doesn't change, the hash should stay the same and will, therefore, be served from the CDN.

Hugo is indeed inferior to Jekyll on this one particular aspect. I hope to see this taken seriously as a feature request as it would allow Hugo to produce devops friendly sites without having to add/learn a different build tool.

rdwatters commented 7 years ago

@yfilali

Putting assets on a CDN with a long time to live is indeed a very important tool to speed up site download. Without a unique hash added to their urls, you have to rely on cache invalidation to expire the assets when they change. A hash like that is simple to generate as you can simply hash the file and use the hash as the filename with the proper extension...

I think I'm missing something. I understand the value of CDNs. That's why I use them extensively. Did I say cache invalidation/busting was bad?

The suggestion to go to the forums was to let users like @sandys (and now you) know that there are conversations regarding performance that may help in the interim. For example, check out this thread, which includes a link to this article, which shows you how can take care of cache busting without any additional Hugo features via a Makefile.

Hugo is indeed inferior to Jekyll on this one particular aspect.

Super! I love Jekyll and think it's incredible software. I use it for projects at work. But that's neither here nor there, so let's stay on topic and focus on what @sandys actually said:

Without this option in place the websites generated by hugo will be far inferior (out-of-the-box) than other tools - even if hugo itself is very fast.

I stand by my previous comment: this is a dramatic and inaccurate statement. A "good website" isn't a checkbox with "cache busting" to the right of it. In fact, even a "fast website" isn't solely a matter of said criterion. Neither Jekyll nor Hugo make your websites auto-magically fast "out of the box."

I hope to see this taken seriously as a feature request

I never said it shouldn't be. I'm all for making sites faster. In fact, I'm obsessed with it.

A hash like that is simple to generate as you can simply hash the file and use the hash as the filename with the proper extension.

That's great news. Any interest in submitting a PR to implement this feature?

Thank you for your input and continued support and ❤️ of Hugo.

sandys commented 7 years ago

Asset pipelines are one of the most useful tools for web developers. I think the answer that people are giving is whether the "purity" of Hugo vanishes, once you put in other stuff like asset pipelines.

Maybe it does, but I'm not sure what is the other way - suggesting that we install npm and nodejs as a Hugo preprocessor step means that I lose the overall speed (that Hugo is famous for ) and dont gain anything much.

My intention was not to piss people off, which seems to be the case here. The previously reply was unusually aggressive - perhaps that's a cue for me to stay away.

On Apr 8, 2017 06:04, "Ryan Watters" notifications@github.com wrote:

@yfilali https://github.com/yfilali

Putting assets on a CDN with a long time to live is indeed a very important tool to speed up site download. Without a unique hash added to their urls, you have to rely on cache invalidation to expire the assets when they change. A hash like that is simple to generate as you can simply hash the file and use the hash as the filename with the proper extension...

I think I'm missing something. I understand the value of CDNs. That's why I use them extensively. Did I say cache invalidation/busting was bad?

The suggestion to go to the forums was to let users like @sandys https://github.com/sandys (and now you) know that there are conversations regarding performance that may help in the interim. For example, check out this thread https://discuss.gohugo.io/t/cache-busting-of-css-js/175/11, which includes a link to this article https://ukiahsmith.com/blog/hugo-static-asset-cache-busting/, which shows you how can take care of cache busting without any additional Hugo features via a Makefile.

Hugo is indeed inferior to Jekyll on this one particular aspect.

Super! I love Jekyll and think it's incredible software. I use it for projects at work. But that's neither here nor there, so let's stay on topic and focus on what @sandys https://github.com/sandys actually said:

Without this option in place the websites generated by hugo will be far inferior (out-of-the-box) than other tools - even if hugo itself is very fast.

I stand by my previous comment: this is a dramatic and inaccurate statement. A "good website" isn't a checkbox with "cache busting" to the right of it. In fact, even a "fast website" isn't solely a matter of said criterion. Neither Jekyll nor Hugo make your websites auto-magically fast "out of the box."

If you prefer more quantitative substantiation, please make a site in Jekyll that includes 20 spaghetti-coded scripts—all called individually without async or defer attributes—doesn't tree shake, includes a bloated CSS framework, doesn't compress/minify/uglify, doesn't inline styles for critical render path, and doesn't optimize images...BUT includes cache busting for each deployment.

Then test it on a slower 3G connection and let me know how fast your experience is as an end user.

I hope to see this taken seriously as a feature request

I never said it shouldn't be. I'm all for making sites faster. In fact, I'm obsessed with it https://developers.google.com/speed/pagespeed/insights/?url=https%3A%2F%2Fhugodocs.info%2Fabout%2F .

A hash like that is simple to generate as you can simply hash the file and use the hash as the filename with the proper extension.

That's great news. Any interest in submitting a PR to implement this feature?

Thank you for your input and continued support and ❤️ of Hugo.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/spf13/hugo/issues/621#issuecomment-292681756, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEsU1ulx45FzjtyyAC8A_upAgMFykiMks5rttYcgaJpZM4C4cH9 .

rdwatters commented 7 years ago

The previously reply was unusually aggressive - perhaps that's a cue for me to stay away.

I didn't mean it to be 🙇

Have you checked out Victor Hugo? It's actively maintained by Netlify, which offers a lot of cool features (even on the free tier), including, but not limited to instant cache invalidation, CDN, etc. I know it's not exactly what you're looking for, but I'm confident it will be faster w/r/t assets/builds than many other SSGs...

Oh, and I should mention there is a Starter Kit page on the new docs concept that you might like. But I ask that if we continue this part of the conversation, we do so in the forums, please.

yfilali commented 7 years ago

@rdwatters would something like this fly? I tested it locally and it works a charm.

https://gist.github.com/yfilali/54f67de3987bfa85afc057450fa6a7e8

Since hugo already processes css files through cssmin, I output a duplicate file with an md5 hash in the same handler.

From there, I just need to use this in my template to get a cache busting css file that only changes when the source content changes:

<link href="/css/styles-{{ readFile "content/css/styles.css" | md5 }}.css" rel="stylesheet">

From there, maybe a similar approach with other filehandlers for the most common asset types would take care of this.

sandys commented 7 years ago

@rdwatters the big advantage which got me here was that hugo is a single binary that runs on windows as well. I'm not sure im keen to pulling an entire nodejs based toolchain with tons of its own dependencies to run a reasonable static website .

I respect your authority in the hugo community - and I accept that we will not see hugo taking on any asset responsibilities. I'll try to figure out what else can be done.

Interestingly, because of the complexity of managing a production asset pipeline in hugo (e.g. multiple dependencies) - we have some very good paid services like Netlify as you pointed out. Its actually not a bad idea. thanks for that hint !

digitalcraftsman commented 7 years ago

Interestingly, because of the complexity of managing a production asset pipeline in hugo (e.g. multiple dependencies) - we have some very good paid services like Netlify as you pointed out. Its actually not a bad idea. thanks for that hint !

Adding all this stuff to the core of Hugo isn't a real solution. We're envision a plugin API that would help to outsource such tasks to self-contained plugins. This would actually close the gap to more dynamically, interpreted languages like Ruby.

earthboundkid commented 7 years ago

No one wants to install npm. It sucks as a dependency, for sure. Non-deterministic builds are no fun. But realistically, are you going to rewrite Webpack, Babelify, and Autoprefixer in Go? (Libsass kind of exists in Go, but it's a C library and makes builds quite slow.) Sure, simple hashing is good and not too hard to add to Hugo, but JavaScript's ecosystem is hard to escape altogether. I think we can and should do more within Hugo, but we're also going to have to integrate with external tools as well.

sajal commented 7 years ago

+1 for plugin idea. something to run on source before or after hugo uses the file, on each rebuild. would help me preview all my changes (for which i currently use makefile) in live server.

yfilali commented 7 years ago

As far as cache busting is concerned, because hugo processes some static content (through cssmin for example for css). It makes sense to further optimize the generated output. This ticket is to request something far simpler than full asset pipeline: "Add cache busting for static resources"

I think the css handler I modified is a good start to do what this ticket calls for. Would be happy to refine it and submit a pull request if it passes muster.

The change is not huge and could be further refined to depend on flags like --cachebust or whatever, and to automatically change the links to that content using something like relref or such.

https://github.com/spf13/hugo/compare/master...yfilali:master

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

earthboundkid commented 6 years ago

Thanks stalebot, this is a feature request. I think it still has value. @bep, did the page bundle thing ever happen? Would that affect the ability to add content hashes to CSS/JS?

nfisher commented 6 years ago

I would prefer if it appended the SHA to the base name of the file rather than as a query parameter. This would allows multiple versions to exist if you rsync without deletion or similar. Further it would allow cache-control headers that could essentially mark the asset as immutable (collisions not withstanding). Essentially Hugo would be responsible for rewriting the filename when it copies it into the output folder something like this;

css/app.css -> css/app-{{base62(sha256(css/app.css)}}.css
css/app.css -> css/app-2bCRQw5kkEoR900YAaE7XKdFfstdWE.css

The problem with using a query parameter to bust the cache is that only one file can exist in the asset folder. It also causes issues where many CDN's will not cache request with query parameters.

This is particularly an issue if for example you have HTML that cached and is incompatible with freshly deployed changes.

I've put together a proposal for a more general Asset function which handles both caching and SRI in issue #4268.

bep commented 6 years ago

This is incredibly hard to imagine inside "Hugo core". On the other hand it is incredibly easy to imagine inside a "assets pipeline" -- in some kind of "hugo plugin" thing. In company with other stuff like JS minifications etc.

nfisher commented 6 years ago

@bep thanks! To clarify is that hard to imagine as in not a good fit for core, difficult to implement, or something else?

bep commented 6 years ago

@nfisher I take my last comment back, I wasn't thinking too hard about this. This is "something different" than, say, HTML minification.

To do this, Hugo would have to "know about the resources". So if these files were Resources (as in the Hugo 0.32 way of doing things), you could add cache bursting stuff to it. But that may not be what yu strive for. But it would work. Note now the resources must live inside /content, but I believe we're going to see resources fetched from different places, eventually.

nfisher commented 6 years ago

@bep in #4268 I'm suggesting a generic asset function which could be used in HTML templates. Sounds like there could be some restrictions on how it could work though from what you're stating.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

hanzei commented 6 years ago

Hugo 0.43 added hugo pipes. They can be used for cache bursting.

davidcalhoun commented 4 years ago

Just a note that the term is technically cache busting. Here's a writeup on how to implement it with pipes: https://regisphilibert.com/blog/2018/07/hugo-pipes-and-asset-processing-pipeline/

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.