bem / html-differ

Сompares two HTML
http://bem.info/tools/testing/html-differ/
MIT License
211 stars 44 forks source link

[ignoreWhitespace] are newlines equal to spaces? #150

Open Feder1co5oave opened 6 years ago

Feder1co5oave commented 6 years ago

I'd like to use html-differ to write a test system for marked, where generated html output is to be compared to pre-written html code, to determine if the two are equivalent, so I would like to ignore whitespace differences, when they are not meaningful (as in <p> tags), but I noticed that newlines are treated differently than spaces:

<p>A link. Not anymore.</p>

is not considered equal to

<p>A link.
Not anymore.</p>

when, IMO, they should be.

I've tracked this to be probably determined by this line of code (line 39 on the right).

I'm aware this issue is related to CSS, and not HTML itself. According to CSS specification, when the white-space property is not pre, pre-wrap, or pre-line, segment breaks (i.e. newlines) are usually converted to a space U+0020.

eGavr commented 6 years ago

Hi!

Spaces are considered to be not equal to newline inside of html-tags...

<p>
Hello, Bob!
</p>

and

<p>Hello, Bob!</p>

are equal, but

<p>Hello, Bob!</p>

and

<p>Hello, 
Bob!</p>

are not!


If you need another behaviour, so you pull request some option, I think.

Feder1co5oave commented 6 years ago

Ah, now I see what went wrong there. Since you strip all newlines in ignoreWhitespace(), the following two paragraphs are equivalent:

<p>wrapped
paragraph</p>

<p>wrappedparagraph</p>
HtmlDiffer = require('html-differ').HtmlDiffer;
htmlDiffer = new HtmlDiffer();
first = '<p>wrapped\nparagraph</p>';
second = '<p>wrappedparagraph</p>';
console.log(htmlDiffer.isEqual(first, second));
> true

I think you should replace them with a space to adhere to the segment break transformation rules for non-pre white-space CSS styles. Multiple adjacent spaces are then collapsed to a single one in the next step, so the following two paragraphs will be equivalent, instead:

<p>wrapped
paragraph</p>

<p>wrapped paragraph</p>

which looks correct to me!