Closed GoogleCodeExporter closed 8 years ago
So can you come up with the rule when spaces should be removed and when they
shouldn't? I can't.
Should spaces be removed here?
<p>
<div style="display:inline">foo</div>
<div style="display:inline">bar</div>
</p>
What about here:
<p>
<span style="display:block">foo</span>
<span style="display:block">bar</span>
</p>
It is impossible to guess your intentions. You can take pretty much any html
element and turn it into something completely different.
Original comment by serg472@gmail.com
on 25 Sep 2010 at 3:25
See http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1:
For all HTML elements except PRE, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). When formatting text, user agents should identify these words and lay them out according to the conventions of the particular written language (script) and target medium.
This layout may involve putting space between words (called inter-word space), but conventions for inter-word space vary from script to script. For example, in Latin scripts, inter-word space is typically rendered as an ASCII space ( ), while in Thai it is a zero-width word separator (). In Japanese and Chinese, inter-word space is not typically rendered at all.
Note that a sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the PRE element). In particular, user agents should collapse input white space sequences when producing output inter-word space. This can and should be done even in the absence of language information (from the lang attribute, the HTTP "Content-Language" header field (see [RFC2616], section 14.12), user agent settings, etc.).
The PRE element is used for preformatted text, where white space is significant.
In order to avoid problems with SGML line break rules and inconsistencies among extant implementations, authors should not rely on user agents to render white space immediately after a start tag or immediately before an end tag. Thus, authors, and in particular authoring tools, should write:
<P>We offer free <A>technical support</A> for subscribers.</P>
and not:
<P>We offer free<A> technical support </A>for subscribers.</P>
As said, whitespace at the beginning and the end of a tag can be removed and
multiple whitespace characters can be compressed into a single character.
Original comment by o...@mirix.org
on 25 Sep 2010 at 10:22
Your examples should be compressed as follows:
<p><div style="display:inline">foo</div> <div
style="display:inline">bar</div></p>
<p><span style="display:block">foo</span> <span
style="display:block">bar</span></p>
Though I think there could be problems with the CSS 3 white-space-collapse
property, but on could argue that it is not the task of a HTML compressor to
interpret CSS. A workaround could be to specify the tags, ids or classes that
html compressor should not collapse.
Original comment by o...@mirix.org
on 25 Sep 2010 at 10:27
What about this:
<div>
<div>1</div>
<div>2</div>
</div>
<div>
<div><img/></div>
<div><img/></div>
</div>
Sorry I still don't see a pattern.
You said prev example should be compressed like this:
<p><span style="display:block">foo</span> <span
style="display:block">bar</span></p>
But who said I don't want a space between <p> and <span>? Maybe I want maybe I
don't. What about space after </p>? Maybe I need it there as well.
Original comment by serg472@gmail.com
on 25 Sep 2010 at 4:33
The HTML 4.01 specification says it all. Your example would be:
<div><div>1</div> <div>2</div></div><div><div><img></div> <img></div></div>
HTML5 is more specific about this, since it contains default CSS rules and
falls back to CSS defaults:
http://lists.whatwg.org/pipermail/help-whatwg.org/2010-September/000665.html
Original comment by o...@mirix.org
on 25 Sep 2010 at 11:13
I made typo, the example should read as:
<div><div>1</div> <div>2</div></div><div><div><img></div> <div><img></div></div>
Original comment by o...@mirix.org
on 25 Sep 2010 at 11:14
I don't see anything in specs that says which spaces should be removed. Can you
please show me where it says that?
Why there is no space here:
<div>2</div>
</div>
<<<<<<<<<<<<<< here
<div>
<div><img/></div>
in prev example?
So:
<div>1</div>
<div>2</div>
becomes:
<div>1</div> <div>2</div>
But:
<div> <div>1</div> </div>
<div> <div>2</div> </div>
becomes:
<div><div>1</div></div><div><div>2</div></div>
?
If I have two divs:
<div></div> <div></div>
Should there be space between or not?
Original comment by serg472@gmail.com
on 26 Sep 2010 at 6:02
http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1 describes the collapsing
and removal of whitespace. I did quote the relevant paragraphs in Comment #2.
Original comment by o...@mirix.org
on 26 Sep 2010 at 11:55
In addition you could also remove instead of collapse the whitespace between
block tags.
Original comment by o...@mirix.org
on 26 Sep 2010 at 12:49
Single space always matters. That spec is talking about removing spaces after
rendering a page, it doesn't say anything about removing spaces _before_
rendering. Leaving one space everywhere instead of multiple spaces before
rendering is the only safe way of doing it.
You can't remove single space at the beginning or end of any tag without
potentially rendering a page differently.
All these:
<span> <span>1</span> </span><span> <span>2</span> </span>
<span> 1 </span><span> 2 </span>
<span>1</span> <span>2</span>
will be rendered as "1 2". Removing any spaces would break it.
Original comment by serg472@gmail.com
on 26 Sep 2010 at 4:09
Issue 41 has been merged into this issue.
Original comment by serg472@gmail.com
on 3 May 2011 at 3:27
Original issue reported on code.google.com by
o...@mirix.org
on 24 Sep 2010 at 11:50