Open bradbeattie opened 13 years ago
Hi,
the issue happens also inside p tags that span multiple lines. For example:
<p>this paragraph spans multiple lines<p>
shows on the browser as:
this paragraph spansmultiple lines
Note that not only the multiple leading spaces on the second line are removed, the new line is not converted to a white space, so the words spans and multiple get together.
Great tool BTW.
Hi!
I'm working on this, but I had a problem reproducing the space problem between two tags... I wrote a test to test the exactly same link case that you showed up, but I couldn't reproduce it, the test is passing... could you tell me what version you're using...?
The problem reported by @jcampanello I was able to catch on the tests, but I'll not finish the code right now. I just wanted to give an update for you guys that we are working on that. :)
Hi Flavia,
the package got installed as "django_htmlmin-0.5-py2.6.egg" (i.e: html-min v 0.5 on python 2.6).
Do you have an idea of when do you plan to release the fix?
Thanks for the effort!
José Luis
Howdy!
I'll probably solve this bug until the end of this week.
;)
Super! Thanks again!
Howdy!
Please, update to 0.5.1 :)
Flavia,
did the upgrade. Everything seems ok.
Thanks again!
According to @bradbeattie, there are still issues with white spaces removal between two tags. e.g.
<button>First</button> <button>Second</button>
I'm gonna take a look at this, and see if makes sense to fix, as soon as I can :)
Note that buttons are just one example. Inline divs will also be an issue, but you can't tell whether a div will be online or not without deep inspection of the CSS and JavaScript. And what about elements with white-space: pre
?
div's are not supposed to be inline, ever. If you need an inline element, don't use a div.
Anyway, this issue is a little tricky, I'm thinking about it...
Right, thanks for sharing :)
i'm still having the issue. Any news fsouza?
not yet, sorry for the delay.
I'll take a look in the next weekend.
Any news on this?
ping. any news on this ? It is messing up my pages.
Looking at the code and the tests, I can see why the whitespaces are dumped and why the test passes (it is made so it passes but does not cover the real problem).
In htmlmin/tests/resources/line_break.html
the line starts with a whitespace which makes the test pass and gives the illusion that line break and whitespace is properly covered.
I think the htmlmin.minify.html_minify()
logic is broken by design, as soon as the source is broken on line breaks it makes the proper handling of line breaks to spaces, multiple spaces, etc very difficult. I see several edge cases like \n
where this won't be rendered properly. And of course the simple <p>my text\nbroken in two lines</p>
is not rendered properly.
I think that a better way (probably slower though) would be to use beautifulsoup to dive in the dom tree and work on a element base. I'll try to implement a solution along those lines.
I'm coming back with some good news, I started by cleaning up a lot of the tests that were incorrect (trailing spaces just before </body>
in htmlmin/tests/test_middleware.py
for example). Then, I implemented what I think is a better and stronger way to parse the html source. A big problem I saw with the previous code was in the multiline handling and some edge cases. I think the code now is more robust. It's also faster! I couldn't believe it in the beginning but you can check by yourself, it's most of the time ~40% faster:
running 10000 rounds
with_html_content_in_pre.html
old style: 38.43s
recurse: 21.97s (57.16%)
with_textarea.html
old style: 44.63s
recurse: 25.22s (56.50%)
without_doctype.html
old style: 34.30s
recurse: 18.61s (54.27%)
with_entities_in_textarea.html
old style: 49.32s
recurse: 29.76s (60.35%)
html5.html
old style: 58.72s
recurse: 34.28s (58.39%)
blogpost.html
old style: 231.60s
recurse: 143.30s (61.87%)
with_pre.html
old style: 97.18s
recurse: 73.11s (75.23%)
with_html_content_in_javascript.html
old style: 55.58s
recurse: 32.67s (58.79%)
with_javascript.html
old style: 54.14s
recurse: 31.36s (57.92%)
with_html_content_in_textarea.html
old style: 46.59s
recurse: 27.07s (58.11%)
with_textarea_with_blank_lines.html
old style: 44.96s
recurse: 25.55s (56.82%)
non_ascii_in_excluded_element.html
old style: 40.09s
recurse: 22.13s (55.20%)
with_multiple_comments.html
old style: 41.99s
recurse: 23.58s (56.16%)
multiple_spaces.html
old style: 32.43s
recurse: 16.68s (51.43%)
with_comments_to_exclude.html
old style: 40.55s
recurse: 22.58s (55.68%)
with_comments.html
old style: 40.51s
recurse: 22.56s (55.68%)
with_menu.html
old style: 45.32s
recurse: 26.19s (57.78%)
non_ascii.html
old style: 36.60s
recurse: 20.31s (55.49%)
line_break.html
old style: 28.06s
recurse: 14.07s (50.15%)
with_blank_lines.html
old style: 39.96s
recurse: 22.55s (56.43%)
with_conditional_comments.html
old style: 42.32s
recurse: 23.58s (55.73%)
with_multiple_line_comments.html
old style: 40.68s
recurse: 22.56s (55.47%)
with_old_doctype.html
old style: 39.15s
recurse: 21.03s (53.70%)
Let me know what you guys think about the code in #69
By the way, non-breakable whitespaces (
) are removed from output too.
The extra whitespace in
<a href="asdf">asdf</a>......<a href="asdf">asdf</a>
Should probably collapse down to<a href="asdf">asdf</a>.<a href="asdf">asdf</a>
But your code turns it into<a href="asdf">asdf</a><a href="asdf">asdf</a>
(spaces represented as periods here as the markdown is collapsing my whitespace)