Closed dimobelov closed 5 years ago
oh, great, thanks Dimo!
Do you have some before/after source code I could add to a test for this?
as in, the HTML source that was previously not being converted properly
Snippet from homepage index.html. Without fix:
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>wpnotes | Поредният WordPress сайт</title>
<meta name="description" content="Поредният WordPress сайт on wpnotes…">
<meta property="og:locale" content="en_US">
<meta property="og:type" content="website">
<meta property="og:title" content="wpnotes | Поредният WordPress сайт">
<meta property="og:description" content="Поредният WordPress сайт on wpnotes…">
<meta property="og:url" content="https://dimobelov.gitlab.io/wpstatic/">
<meta property="og:site_name" content="wpnotes">
<meta name="twitter:card" content="summary"> ...
With fix:
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>wpnotes | Поредният WordPress сайт</title>
<meta name="description" content="Поредният WordPress сайт on wpnotes…">
<meta property="og:locale" content="en_US">
<meta property="og:type" content="website">
<meta property="og:title" content="wpnotes | Поредният WordPress сайт">
<meta property="og:description" content="Поредният WordPress сайт on wpnotes…">
<meta property="og:url" content="https://dimobelov.gitlab.io/wpstatic/">
<meta property="og:site_name" content="wpnotes">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="wpnotes | Поредният WordPress сайт">
<meta name="twitter:description" content="Поредният WordPress сайт on wpnotes…">
...
Perfect, thanks!
A few other fixes, cleanups and tests being added at the moment, I'll get this into the next release.
Btw its good to double decode output html. This is known issue with unicode.
$processed_html = html_entity_decode($processed_html, ENT_QUOTES, 'UTF-8');
/// and again
$processed_html = html_entity_decode($processed_html, ENT_QUOTES, 'UTF-8');
@dimobelov, I'm not having any joy with testing this:
HTMLProcessorUnicodeSupport
✘ Unicode output data set "unicode characters in source"
│
│ Failed asserting that two strings are equal.
│ --- Expected
│ +++ Actual
│ @@ @@
│ '<!DOCTYPE html>\n
│ -<html lang="en-US"><head></head><title>wpnotes | Поредният WordPress сайт</title><body></body></html>\n
│ +<html lang="en-US"><head></head><title>wpnotes | Поредният WordPress сайт</title><body></body></html>\n
│ '
│
│ /home/leon/example.com/site/web/app/plugins/static-html-output-plugin/provisioning/tests/HTMLProcessor/unicodeSupportTest.php:48
Even with multiple decodings... Any ideas?
OK, some progress with adding <meta charset="utf-8"/>
to the test inputs.
Fix: https://github.com/leonstafford/wp2static/blob/e7bc17116859a10c0b8b2c1c95f0215dda3b6ca3/library/StaticHtmlOutput/HTMLProcessor.php#L611