eduardokum / laravel-mail-auto-embed

MIT License
166 stars 36 forks source link

DOMDocument does not work well with HTML5 #22

Closed Grldk closed 4 years ago

Grldk commented 4 years ago

I just updated this package to master, and noticed exceptions from DOMDocument appearing in my log.

Turns out email parsing changed to DOMDocument, which is far more critical of HTML than the older parsing method, and does not work well with some newer HTML5 entities, both of which break some of my emails.

Would it be possible to add libxml_use_internal_errors(true); to the attachImages() function? @roelofr

Note: I am not complete sure if there are other implications of surpressing these errors. Should errors be cleared after sending mail with libxml_clear_errors() ?

roelofr commented 4 years ago

Hmm, that's weird, since I tested it with some HTML5 entities (an article tag, if my memory is correct).

I could look into this, and add some tests using HTML5 emails, but could you specify which HTML5 tags you're using (or provide some sample HTML), since this issue should be prevented in future updates too.

Maybe using masterminds/html5 internally would help prevent this issue.

Grldk commented 4 years ago

Hey, <article> entities throw warnings as well:

$document = new DOMDocument();

$document->loadHTML('<article>Test</article>');

PHP Warning:  DOMDocument::loadHTML(): Tag article invalid in Entity, line: 1 in Psy Shell code on line 1

Other entities like nav and section won't work either: https://stackoverflow.com/questions/6090667/php-domdocument-errors-warnings-on-html5-tags

And apart from that I have to support user generated (sanitized of course!) html, which would work fine with the old parser and can throw unexpected errors with the DOMDocument parser.

A <table> inside a <p> can happen with CKEditor for example.. Not valid HTML, but works fine in all email clients I have tried, but DOMDocument doesnt like it..

roelofr commented 4 years ago

I'll see if I have time early next week for a PR that fixes HTML5 and adds some tests.

And apart from that I have to support user generated (sanitized of course!) html, which would work fine with the old parser and can throw unexpected errors with the DOMDocument parser.

I think this is a peculiar edge case. The old Regex-way of matching HTML is, as mentioned in [this article][1] a rather bad practice, which is why I recommended and programmed the DOM parsing. Maybe the html5 library i suggested can also handle this quietly. I'll certainly try to take it into account with the tests, as email clients aren't known for their strict handling of HTML of course.

roelofr commented 4 years ago

Well, that was a lot easier than expected.