cobrateam / django-htmlmin

HTML minifier for Python frameworks (not only Django, despite the name).
http://pypi.python.org/pypi/django-htmlmin
BSD 2-Clause "Simplified" License
542 stars 73 forks source link

Modify whitespace regex to keep non-breaking spaces #84

Closed raphaelm closed 9 years ago

raphaelm commented 9 years ago

As noted in issue #83, django-htmlmin currently removes  . Normally, a programmer would expect that   characters would be kept, as they are normally put into a document for a reason.

To keep the character, I modified the regex that is used to recognize whitespace. I had to use a lookahead regex expression in order to still profit from python's built-in \s constant.

Please note that the minified HTML will contain the Unicode character \xA0 instead of  . This is because of BeautifulSoup's HTML rendering is consistent with the behaviour for other HTML entities.

andrewsmedina commented 9 years ago

@raphaelm thank you!