gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
74.82k stars 7.46k forks source link

Support for minification of generated HTML files #1251

Closed chaseadamsio closed 6 years ago

chaseadamsio commented 9 years ago

When generating HTML, it'd be really nice to have the code minified at least as a single line of HTML (instead of just output with lots of spacing and line breaks). Would it be hard to implement this?

Dr-Terrible commented 9 years ago

For these kind of tasks there is an handy go library that could be easily integrated in the Hugo's pipeline: https://github.com/tdewolff/minify

bep commented 9 years ago

This has been discussed, and this may be relevant to the plugin discussion. @Dr-Terrible I haven't looked at minify -- but if it is small and well written, one could argue that HTML minify should be a core feature off Hugo (and not a plugin).

But with plugins we could pull in all kinds of stuff, even stuff not written in Go.

chaseadamsio commented 9 years ago

For the time being, I'm using a gulp task (since I'm already using it for deploying to s3) to minify and update asset hashes.

Personally, I'm down with the plugin architecture...I'd be interested to hear more about the discussion. I'll look around and lurk. Thanks!

bep commented 9 years ago

I use gulp for similar stuff myself. I would love to ditch that. It should be possible, the hardest part getting a decent LESS implementation in Go, but maybe that's not necessary.

Search for plugin here and at http://discuss.gohugo.io/

Dr-Terrible commented 9 years ago

@Dr-Terrible I haven't looked at minify -- but if it is small and well written, one could argue that HTML minify should be a core feature off Hugo (and not a plugin).

@bep That is why I have suggested minify in the first place. It's relatively small, well written, and extremely fast too (<10ms per page). I'm completely against the idea of plugins (I saw the discussion on http://discuss.gohugo.io/ some time ago); for that kind of features there is npm, gulp, grunt, and tons of tools alike. Unfortunately, there aren't decent plugins for HTML tidying/minification, and I agree with you that one could argue that it should be a core function of Hugo (but only for HTML page, no CSS/JS/SVG/XML etc). I like that idea, since HTML pages generated by Hugo have a lot of dead weight on due to the Go template system injecting white spaces and new lines like there is not tomorrow. I'd like Hugo to be able to trim that fat too (with Jekyll there is a specific plugin to remove the extra noise from Shopify) .

In case, we can ask @tdewolff, the author of minify, to branch off a library specific only to HTML minification, leaving behind all the rest. Or we can do it by ourself; after all, the HTML part of minify is just around ~400 lines of code, and the lexer-parser is even shorter: https://github.com/tdewolff/parse/tree/master/html

bep commented 9 years ago

@Dr-Terrible agree about the HTML only part. There are ways to make parts of an library optional in Go (two that I know of), but I guess only of real value if you also get rid of some heavy dependencies. See this Viper-commit for an example of a big-win: https://github.com/spf13/viper/commit/be5ff3e4840cf692388bde7a057595a474ef379e

bep commented 9 years ago

Had a quick look, and it looks like minify have a very decent split into packages, so you can use only what you need.

bep commented 9 years ago

@spf13 what are your thoughts on this?

tdewolff commented 9 years ago

Minify is already split into the various formats as subpackages. Is that what you need?

I generally find that most compression of HTML is due to embedded JS or CSS, but maybe this doesn't apply to Hugo. Let me know if you need anything.

spf13 commented 9 years ago

There's a solid argument to be made that Hugo should support gzipping pages. Minification is pretty worthless compared to gzip when you compare the two.

For a while I've thought hugo should operate in modes. The two obvious ones are dev and prod.

Dev would automatically turn on server and watch. Prod would enable things like compression etc.

I'm not against minification. There's a good argument for both minification and gzip being the best. http://stackoverflow.com/questions/807119/gzip-versus-minify

chaseadamsio commented 9 years ago

Would it be unreasonable to have both as flags or configuration variables for prod?

bep commented 9 years ago

@chaseadamsio hat would make sense. But is prod vs dev an extra flag?

chaseadamsio commented 9 years ago

I don't think it's necessary since we have --watch which is essentially dev. I see it as an --optimize flag with false as default, gzip for gzip and minify as minify. This would satisfy "my requirements" (I'd be interested to hear other people's take on it.

I can appreciate saying one is better than the other, but it seems debilitating to pick one for the end user and pigeonhole them to that, especially when this is build time, so it's not a lot of extra overhead to have both options.

spf13 commented 9 years ago

my plan is to keep the flags approach as you all have outlined (except I'd break out gzip and minify into separate flags).

I'd add two commands, hugo prod and hugo dev which are just shortcuts that enable a set of flags.

I'm also fine with --optimize which would do a few optimizations all at once.

There's a long history of tools that provide shortcut flags and commands that effectively enable commonly used settings together.

chaseadamsio commented 9 years ago

@spf13 I'm happy to help with this if there's an approach outlined somewhere that you feel is "blessed" to start on.

spf13 commented 9 years ago

@chaseadamsio Would love help. I would suggest the following.

Doing these separately and independently (but thinking of them working together)

Adding gzip support Adding minify support

I wouldn't worry about the shortcut options yet.

I would only support optimization for content to start (leave css & js alone for now). A second PR could add support for minification of CSS & JS when placed in the content directory.

The general rule I follow is that any file in content means that the creator wants Hugo to process it. Anything in static remains untouched and just byte for byte copied over.

It's probably best to add this functionality in https://github.com/spf13/hugo/tree/master/target

Jos512 commented 9 years ago

The general rule I follow is that any file in content means that the creator wants Hugo to process it. Anything in static remains untouched and just byte for byte copied over.

That makes sense, but in the case of gzip it would mean that js and css files aren't gzipped, despite them being often much larger in size than HTML pages are. In other words, a gzip feature for only HTML will be a minor improvement, while gzipping all eligible files would be a big improvement.

spf13 commented 9 years ago

I think the gripping may be better as a flag that does it absolutely for all files along side non-gzip versions.  I haven't given this enough thought though on maintaining links and whatnot. 

Best, Steve

Steve Francia http://stevefrancia.com http://spf13.com http://twitter.com/spf13

On Tue, Jul 21, 2015 at 7:57 AM -0700, "Jos512" notifications@github.com wrote:

The general rule I follow is that any file in content means that the creator wants Hugo to process it. Anything in static remains untouched and just byte for byte copied over.

That makes sense, but in the case of gzip it would mean that js and css files aren't gzipped, despite them being often much larger in size than HTML pages are. In other words, a gzip feature for only HTML will be a minor improvement, while gzipping all eligible files would be a big improvement.

— Reply to this email directly or view it on GitHub.

whereisaaron commented 9 years ago

I'd weigh in for gzip as the first built-in feature to include. It is much much better compression than minifying, which I assume is the goal here. And it is language agnostic, so it will work CSS and JS and JSON and XML and whatever else comes down the pike. If would be good to apply to every file type possible, if enabled.

Of course also having a minify option is fine so long as it is optional. I have a few, admittedly tangential arguments against minification. Minify is (often) language-specific and IMHO carries more risk of triggering platform/version/language specific bugs. Minification also obfuscates code (especially if identifiers are replaced with shorter, less meaningful ones), making your code and errors hard to debug for clients. You might like the DRM effect of that, or you could argue it is evil to pollute The Internet Archive, and history, with your minified junk :-)

ghost commented 8 years ago

I'm also in support of gzip for hugo. I was testing nginx and found under a certain configuration that a 1MiB uncompressible file took 50ms to serve over localhost if gzip was turned on. If I had it precompressed, or gzip disabled, it was more or less instaneous. Notably smaller files weren't affected, but I'm not sure what the gap is.

CloudFront also only supports serving pre-compressed files with the .gz next to non-compressed. At a certain point, gzipping the content will be the bottleneck on the server. There's basically no reason not to pre-compress the contentent. Down the road, I wonder if I xz will be added, if the decompression is fast enough. It's far too slow consider for on-the-fly, but doable for precompressed. I would even prefer lzo over gzip for on-the-fly.

I created #1674 as gzip-specific.

bep commented 8 years ago

Notably smaller files weren't affected, but I'm not sure what the gap is.

Hugo content = smaller files, I would say.

ghost commented 8 years ago

@bep

The "small" files were a few lines of text. This may come in to play depending on the size of the gzip buffers enabled for nginx. May be 4KiB, 8KiB, or even up to 32KiB (though I don't know how those settings work exactly). I'd suspect a lot of the pre-compressed output is in the 5-20KiB range. So there's a chance of a huge performance gain or a slight performance gain here. I'm sure someone else has done more testing with the gzip aspect of nginx than I and could tell you exactly why it behaves that way.

Main drawback is a somewhat increased size in disk usage and upload bandwidth. And you can always turn it off (or not turn it on) if that's a problem.

joelbeckham commented 8 years ago

Just a quick note that CloudFront added gzip compression yesterday: https://aws.amazon.com/blogs/aws/new-gzip-compression-support-for-amazon-cloudfront/

bep commented 8 years ago

https://github.com/tdewolff/minify

tajmone commented 8 years ago

Thumbs up for this issue! I was about to file an issue for the same topic but found it already in place.

I totally agree with the dev/prod approach. From my experience, I would think that most users would expect default Hugo behavior in production mode to be TIDY (ie: proper indenting of HTML code), so it looks nice at source inspection. Definitely MINIFY would be the second choice inline.

I didn't think of gzipping, and I don't have much practical experience with it, but it sounds a good idea — if it can handle also non HTML files.

gunnarmorling commented 7 years ago

Just came across this.

Not even thinking of full minification, I'd personally already be very happy if Hugo would not emit many subsequent empty lines (with trailing whitespace). I understand it's happening when template instructions are processed. For ranges this gives me lots of empty lines in the generated HTML.

Should I file a separate issue for that one (as it arguably could/should be fixed independently of full minification)?

bep commented 7 years ago

Should I file a separate issue for that one (as it arguably could/should be fixed independently of full minification)?

You can do this yourself in the templates by using the {{- print "hello world" -}} syntax, which is clumsy, but the best the Go team could come up with without breaking stuff.

We do, however, do some AST transformations of the Go templates, so adding some whitespace handling as an option could be doable -- but another issue.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

regisphilibert commented 6 years ago

I can see the issue has been mentioned in https://github.com/gohugoio/hugo/issues/3207. I personally think they should not be related.

  1. Contrary to most build scripts, html modification must happen after Hugo.
  2. Because Hugo outputs the html, it would only feel normal that its process does the html minification without having to wait for a more complex pipeline-based content processing feature.

My own two cents.

regisphilibert commented 6 years ago

Now that hugo handles minification nicely for several formats (html, json, js, etc...): https://github.com/gohugoio/hugo/commit/dea71670c059ab4d5a42bd22503f18c087dd22d4 I wonder if this minfication couldn't be performed for all pages' output formats.

It could be as simple as a parameter added to the output format object in config.yaml.

outputFormats:
    html:
        minify:true
     json:
        minify:true
Jos512 commented 6 years ago

It could be as simple as a parameter added to the output format object

True, but we'll also need some minimum configuration options. For my setup I'll need to keep one type of HTML comments since the server relies on them.

And perhaps others might want to configure how aggressive whitespace should be collapsed (since this can affect how the webpage looks, even though with good design that wouldn't happen).

regisphilibert commented 6 years ago

True, but we'll also need some minimum configuration options.

Minimum is never enough but for one configuration.

I don't think (but may be wrong) that the maintainers will spend time on a configuration options set which will suit everyone. I think we should find what is the most used minifying configuration and apply it by default.

If users need more than this, they will have to fallback to other minification tools.

Maybe someone well aware of HTML minification configurations may create a "ideal config/usecase" poll?

regisphilibert commented 6 years ago

As a reference here is an html-minifier's list of options and defaults I counted ~35 😱

zivbk1 commented 6 years ago

Is there a case open for compression only? I care more about it and less about minification.

Jos512 commented 6 years ago

I don't think (but may be wrong) that the maintainers will spend time on a configuration options set which will suit everyone. I think we should find what is the most used minifying configuration and apply it by default.

Not sure if I follow. You don't want contributors spend time on configuration options (and I understand that :slightly_smiling_face:). But they should figure out what minifying options work for most website without breaking them.

I understand where you're coming from, but your approach might end up taking a lot more time. Both in coding and testing, and then later on again in providing support.

Exposing the minification options from the library that Hugo uses takes less time. And, more importantly, that also places the responsibility of minifying not working on the user. He or she then needs to tweak the minification options instead of opening GitHub issues to 'make Hugo minify work with my website' or 'Hugo minify breaks my website'.

onedrawingperday commented 6 years ago

Exposing the minification options from the library that Hugo uses takes less time. And, more importantly, that also places the responsibility of minifying not working on the user.

I agree with the above. It would be ideal to have the various options available in the config.

However this may be more time consuming to implement than it seems.

regisphilibert commented 6 years ago

But they should figure out what minifying options work for most website without breaking them.

No we would try and do that for them. As we I mean the people pushing for this feature.

Exposing the minification options from the library that Hugo uses takes less time.

I wouldn't know. But if true, then rest assured they'll go with that option instead of my one-case-for-all suggestion.

And, more importantly...

All that follows is beside my point. And I agree with all of it, if exposing the minification options don't take any more time. Even though that would mean more work for the documentation maintainers.

Jos512 commented 6 years ago

No we would try and do that for them. As we I mean the people pushing for this feature.

Ah I see. We can certainly do that, once we know which library Hugo intends to use (and what minifying options that would give).

Even though that would mean more work for the documentation maintainers.

If Hugo uses the same minifier as it does for pipes, then there are 5 configuration options (see here). All those have sensible names (KeepWhitespace). That seems like an amount of documentation work that's certain doable.

regisphilibert commented 6 years ago

If Hugo uses the same minifier as it does for pipes, then there are 5 configuration options (see here).

True at a documentation standpoint listing the configuration option does not look too complicated it if is a global configuration. But then again, I would not be surprised if it had some unknown consequences I am not able to pinpoint.

outputFormats:
    html:
        minify:true
     json:
        minify:true

minificationOptions:
    KeepConditionalComments: true
    KeepEndTags: false

I agree with you that 5 options does not seem too much of a burden as a Hugo user, as a Hugo maintainer, again, I cannot say.

bep commented 6 years ago

In this particular case, it makes most sense to just make it a complete hash map ... and will in some sensible defaults if not set. We document the small sub-set and link to the full set for people who really needs it.

tdewolff commented 6 years ago

I just wanted to say, as the author of minify, that I specifically intent to keep the amount of options down as much as possible. The minifiers should work as they are for most use cases, and I only want to restrict options to cases that are debatable such as the precision in numbers (CSS, SVG) and keeping whitespace (bad HTML design might be dependent on whitespace). Otherwise the defaults are intended to work for everyone.

In particular I'd be keen to remove the KeepDocumentTags and KeepEndTags, as they address issues for non-compliant browsers/HTML parsers. KeepDefaultAttrVals is only there for CSS rules that specifically select input[type="text"] while not selecting input:not([type]), but both are equal in terms of HTML/display! I will be lenient on those three to keep the API stable, but without strong arguments for any of them they might get removed at some point.

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.