MyIntervals / emogrifier

Converts CSS styles into inline style attributes in your HTML code.
https://www.myintervals.com/emogrifier.php
MIT License
907 stars 154 forks source link

Breaking change in 7.1 #1239

Closed mortenscheel closed 11 months ago

mortenscheel commented 11 months ago

1214 changes the default formatting, which is a breaking change.

I can't tell from the readme whether you're using semantic versioning or not, but some of my tests started failing (some line breaks went missing) after running composer update, and finding the cause took a bit of effort. I guess it's kind of grey area since it's only changing whitespace, but anything that changes what the package produces, should probably be a new major version.

But still, thanks for your work. It's a great package.

oliverklee commented 11 months ago

Yes, we strive to follow Semantic Versioning.

This indeed is a gray area.

We didn't consider this as a breaking change as the functionality of the generated HTML is not affected by the formatting.

In general, I would recommend to write tests for HTML (and XML) to be formatting-independent in order to make them less brittle.

@JakeQZ What do you think?

oliverklee commented 11 months ago

And @mortenscheel, thanks for reporting this!

JakeQZ commented 11 months ago

I notice tha the changelog entry says "Disable HTML formatting by default".

Although there are no methods or parameters in CssInliner to explicitly control the behaviour, there is a method to get the internal DOMDocument object, on which the formatOutput property can be set.

The following should achieve the previous behaviour, though I've not tested it (and setting the formatOutput property after loading the HTML might result in a different behaviour from when it is set before loading the HTML):

$cssInliner = CssInliner::fromHtml($html);
$cssInliner->getDomDocument()->formatOutput = true;
$visualHtml = $cssInliner->inlineCss($css)->render();

The old behaviour (in in which the rendered HTML is formatted) could be considered to be a bug, for example if list items are displayed as inline-block and do not have white space between them, but the formatting introduces some, or if some elements have the white-space property set (e.g. to pre), and extra whitespace is added by the formatting which then affects the layout, though I have not tested these scenarios.

mortenscheel commented 11 months ago

In general, I would recommend to write tests for HTML (and XML) to be formatting-independent in order to make them less brittle.

Do you have any tips on how to do that? Currently I'm comparing the output html to a fixture, and I don't know how I can ignore only the redundant whitespace. Except maybe running both throw a DomDocument, but that seems like a hassle.

oliverklee commented 11 months ago

@mortenscheel The XML-specific assertions of PHPUnit might be helpful: https://docs.phpunit.de/en/10.4/assertions.html#xml

oliverklee commented 11 months ago

Or maybe https://github.com/caxy/php-htmldiff is worth a try.

JakeQZ commented 11 months ago

The XML-specific assertions of PHPUnit might be helpful: https://docs.phpunit.de/en/10.4/assertions.html#xml

Looks like this internally uses DOMDocument, but it may only allow XML or XHTML if it uses the loadXML method and not the loadHTML method.

I was thinking you could just use regex to collapse all whitespace sequences to a single space:

$normalizedHtml = \preg_replace('/\\s++/', ' ', $html);

This would have the drawback of replacing all line breaks with regular spaces, making the differences diffucult to see. That could be mitigated against by changing spaces after tags to line breaks:

$normalizedHtml = \str_replace('> ', ">\n", \preg_replace('/\\s++/', ' ', $html));

Though differences where whitespace is important, such as preformatted text, wouldn't generate test failures, so you'd have to check those cases differently.

oliverklee commented 11 months ago

Please note that HTML cannot be fully parsed with regular expressions. :wink:

JakeQZ commented 11 months ago

Please note that HTML cannot be fully parsed with regular expressions. 😉

Indeed, though the above is not parsing it, just removing extra whitespace.

oliverklee commented 11 months ago

Converting to a discussion as this has turned from a potential bug report to a discussion. :-)