approvals / ApprovalTests.Ruby

Approval Tests for Ruby
MIT License
228 stars 44 forks source link

Html approver mangles html with boolean attributes #44

Open markijbema opened 10 years ago

markijbema commented 10 years ago

the following (valid) html:

<!DOCTYPE html>
<html>
<title>Hoi</title>
<script async defer src=\"http://foo.com/bar.js\"></script>
<h1>yo</h1>

becomes this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><script>window.FactlinkProxiedUri = "http://www.example.org/foo?bar=baz";</script><script></script>defer
               src="http://localhost:8000/lib/dist/factlink_loader.js?o=proxy"
               onload="__internalFactlinkState('proxyLoaded')"
         &gt;</html>
kytrinyx commented 10 years ago

See also #7

hotgazpacho commented 10 years ago

So, the html posted above is not a valid HTML document. It's an incomplete fragment (no <body> tag set, no enclosing </html>). How should this be treated? Should it become a valid document, or be left as a fragment?

hotgazpacho commented 10 years ago

To clarify, if we tell nokogiri to treat it as a fragment (after telling it to parse as HTML instead of XML), we get the following:

<title>Hoi</title><script async defer src="http://foo.com/bar.js"></script><h1>yo</h1>

However, if we tell nokogiri to treat it as a document, we get this:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hoi</title>
<script async defer src="http://foo.com/bar.js"></script>
</head>
<body><h1>yo</h1></body>
</html>

Which is correct?

markijbema commented 10 years ago

I would say the latter. If we care about the exact bytes/chars, we can still use text. If we care about content/structure of html, the latter will result in more useful diffs (which is very useful in pullrequests)

markijbema commented 10 years ago

I simplified the example I think. I'm not sure whether actual validity was an issue, and removing the html opening tag would make it valid html5

kytrinyx commented 10 years ago

It's important to NOT change the output.

  1. A lot of the testing is for legacy things that do not have valid structure, and also sometimes for fragments that get loaded elsewhere as part of a larger page.
  2. If the document is invalid, I don't want my approval to show me valid HTML. The golden master file should contain what the response returned -- the only change should be whitespace.

If I have an HTML fragment, I want to have standardised formatting (so :text is not useful in that case), which is why I introduced nokogiri in the first place (I tried using Tidy, but couldn't figure out how to get it running). This avoids spurious whitespace failures.

kytrinyx commented 10 years ago

(Sorry it took so long to respond. I'm at a conference, and I find it very hard to keep track of issues/pull requests during conferences).

markijbema commented 10 years ago

The first half of your comment contradicts the point about the spacing though. Spaces can be very significant in html, and changing it can break your app. Those changes would go undetected, but others wouldn't. I think we agree the only use of a formatter is to change the output. The question is the amount of change it should apply. To me, allowing changing spacing, but not other stuff seems a bit arbitrary.

I don't think it's wise to use the same formatter for html documents, and html fragments. I never assumed this formatter was aso used for html fragments. I'm not sure about the semantics of the XML parsers, but html fragments don't need to be valid xml documents (for instance, they don't need a root).

Regarding invalid documents there are still multiple strategies. Do you think this is an invalid strategie for valid documents? For invalid documents we could (for instance)

  1. raise an exception, and fail the test
  2. output the unformatted html verbatim.
kytrinyx commented 10 years ago

Spaces can be very significant in html, and changing it can break your app.

Ah, see this is where I fall down because I don't know front-end stuff! OK, cool. Thanks so much for clarifying.

I think that either of these strategies could work. @hotgazpacho suggested just outputting a warning. That might be the best thing to do since it's really up to the user what the real thing they're trying to verify is.