EmidioStani / htmlcompressor

Automatically exported from code.google.com/p/htmlcompressor
Apache License 2.0
0 stars 0 forks source link

Use a line feed character when reducing some whitespace to improve readability of compressed markup. #42

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Overview
========

Currently all occurrences of whitespace are replaced with a single space 
character in the compressed markup. This generally results in a one long line 
that is very difficult for a human to read.

If certain whitespace sequences were condensed to a line feed instead of a 
space, the result would be considerably more readable. Obviously this should 
not happen to every whitespace sequence, as that would be even less readable 
than the current implementation.

I propose condensing whitespace sequences to a line feed only if there is at 
least one newline (a line feed or carriage return) in it already.

Example
=======

    <ul>
        <li>Foo</li>
        <li>Bar <em>baz</em></li>
    </ul>

Instead of:

    <ul> <li>Foo</li> <li>Bar <em>baz</em></li> </ul>

Becomes:

    <ul>
    <li>Foo</li>
    <li>Bar <em>baz</em></li>
    </ul>

Considerations
==============

 1) The line feed character is not universally recognised as a new line. Presumably it *is* recognised as whitespace by all major browers on all platforms. If this assumption turns out to be false this feature request falls somewhat flat.
 2) The behaviour will not be desirable to all, and adding the check for an existing newline necessarily slows down the compression (at least slightly), so it should be optional.
 3) The behaviour may not be wanted in certain situations, e.g., inside an element or attribute:

        <div class="container"
             style="color: lime;
                    background-color: magenta"></div>
 4) Humans are not the intended target of this application. As source if readily viewable to end users, though, it would be considerate.

Original issue reported on code.google.com by al.barr...@gmail.com on 27 Jun 2011 at 4:06

GoogleCodeExporter commented 8 years ago
Thanks for the suggestion.

I've had something similar implemented already (unintentionally), but it was 
reported as a bug which I fixed, as it produced inconsistent compression 
results on Linux and Windows because \n is recognized as a new line character 
only on *nix, on Windows it is \r\n. 

In your first paragraph you said it would make code more readable, but at the 
end you said "Humans are not the intended target of this application". Then who 
is?

I am not sure if code readability is something I need to worry about. If you 
look at any javascript compressor, they produce very hard to read one-liner, 
and it is kind of the whole point.

Usually if I need to check something in a compressed HTML I just use browser's 
developer tools which show html as a dom tree. There are also html beautifiers 
out there.

Original comment by serg472@gmail.com on 27 Jun 2011 at 5:40

GoogleCodeExporter commented 8 years ago
> \n is recognized as a new line character only on *nix, on Windows it is \r\n. 

This is mostly what I was getting at with point 1. However, most 
editors/viewers I have used recognise the alternative newline sequences 
regardless of platform - the only exception that comes to mind is Notepad. 
Perhaps if the feature were to be implemented, the newline sequence could be 
configurable. Allowing multiple character newlines such as \r\n would result in 
suboptimal compression though.

> In your first paragraph you said it would make code more readable, but at the 
end you said "Humans are not the intended target of this application". Then who 
is?
>
> I am not sure if code readability is something I need to worry about. If you 
look at any javascript compressor, they produce very hard to read one-liner, 
and it is kind of the whole point.

The point of this feature would be to make code more readable, but I was 
explicitly noting that readability is obviously not a primary concern for the 
project. The benefits probably do not outweigh the cost if it is non-trivial to 
implement.

If it *was* sanely implementable, I think it would be a useful optional 
feature. Looking at the first example, the result with line feeds follows the 
principle of least surprise (in my opinion).

Original comment by al.barr...@gmail.com on 27 Jun 2011 at 6:06

GoogleCodeExporter commented 8 years ago
Ok, I will add it to my todo list. 

Original comment by serg472@gmail.com on 27 Jun 2011 at 6:29

GoogleCodeExporter commented 8 years ago
well I don't think that feature would be very popular. I wouldn't use that 
feature ever. It is useful for debug purpose but in this case you can just 
disable compression. So I guess the implementation of this feature depends on 
the feature popularity. Or someone can contribute and develop that (like I did 
for maven migration - I just need that so I contribute) :)

Original comment by alextu...@gmail.com on 30 Jun 2011 at 8:14

GoogleCodeExporter commented 8 years ago
Added in 1.4 release.

Command line compressor has --preserve-line-breaks option.
Java API has setPreserveLineBreaks(true) method.

It preserves original line breaks in the document (multiple breaks are 
collapsed). So when compressing such document:

a\r\n\r\n\r\n
b\n\n\n
c

the result will be:

a\r\n
b\n
c

Thanks.

Original comment by serg472@gmail.com on 8 Jul 2011 at 10:43