ankane / ahoy_email

First-party email analytics for Rails
MIT License
1.11k stars 137 forks source link

Fix incorrect reconstruction of emails with <head> elements #150

Closed gkampjes closed 3 years ago

gkampjes commented 3 years ago

Changes

Context

As of v2.0, the processor uses Nokogiri::HTML::DocumentFragment to parse the email body, rather than Nokogiri::HTML::Document.

Despite what the semantics of Mail::Message#body implies, it actually returns the compete HTML document, including the <head>.

It appears that Nokogiri::HTML::DocumentFragment#parse strips any invalid nodes, essentially anything that's valid within a <body> element. The main impact is that any emails relying on <style> elements in the header will render without any CSS styling in Gmail. Additionally, the style= attribute on the <body> tag gets stripped off.

Parsing a document fragment with Nokogiri::HTML::Document doesn't appear to have any negative impacts, it simply adds the required elements to create a valid HTML document. There is a possibility that automatically added doctype may cause some formatting issues, and if that's a concern, I can default the doctype to HTML5.

Before/After behaviour

With a full HTML document

<!DOCTYPE html>
<html lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html charset=UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Test Title</title>
  <style type="text/css">
    a {color: red;}
  </style>
</head>
<body style="background-color:#ABC123;">
<p>
  <a href="https://example.org/">Test Link</a>
</p>
</body>
</html>

Parsing as a Nokogiri::HTML::DocumentFragment (current behaviour) mangles the HTML into this:

<body>

  <meta http-equiv="Content-Type" content="text/html charset=UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Test Title</title>
  <style type="text/css">
    a {color: red;}
  </style>

<p>
  <a href="https://example.org/?utm_source=processor_test_mailer&amp;utm_medium=email&amp;utm_campaign=email">Test Link</a>
</p>

</body>

Using Nokogiri::HTML::Document, the email correctly is processed into this:

<html lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html charset=UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Test Title</title>
  <style type="text/css">
    a {color: red;}
  </style>
</head>
<body style="background-color:#ABC123;">
<p>
  <a href="https://example.org/?utm_source=processor_test_mailer&amp;utm_medium=email&amp;utm_campaign=email">Test Link</a>
</p>
</body>
</html>

With a document fragment

<p>
  <a href="https://example.org/">Test Link</a>
</p>

Current behaviour (Nokogiri::HTML::DocumentFragment):

<p>
  <a href="https://example.org/?utm_source=processor_test_mailer&amp;utm_medium=email&amp;utm_campaign=email">Test Link</a>
</p>

Using Nokogiri::HTML::Document:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>
  <a href="https://example.org/?utm_source=processor_test_mailer&amp;utm_medium=email&amp;utm_campaign=email">Test Link</a>
</p>
</body></html>
gkampjes commented 3 years ago

Thanks for the feedback!

Should I move the test cases to utm_params_test.rb, click_test.rb, or both? And should I also scrap fixtures/email.html and inline the html in test mailer(s) instead?

ankane commented 3 years ago

Looks great, thanks @gkampjes! Pushing a new release now.