Closed pixeline closed 1 year ago
@pixeline check PR: https://github.com/Simply-Static/simply-static/pull/140
That should be enough.
Currently, we already have the action there, but not the methods set there to access protected data.
@igorbenic Thanks! I just tested it and it works super well!
I just noticed something a bit strange: the log shows the error message, but the final log does not show it.
The code I used:
add_action( 'ss_after_extract_and_replace_urls_in_html', 'cancel_if_invalid_html', 20, 2 );
function cancel_if_invalid_html( $dom, $extractor ) {
if ( !is_html_valid( $dom->innerHtml )) {
$message = 'Invalid HTML on URL ' . $extractor->get_static_page()->url;
// Abort the job if the HTML is invalid.
\Simply_Static\Plugin::instance()->get_archive_creation_job()->cancel( $message );
}
}
if (!function_exists('is_html_valid')) {
function is_html_valid($html)
{
$start = strpos($html, '<');
$end = strrpos($html, '>', $start);
$len = strlen($html);
if ($end !== false) {
$html = substr($html, $start);
} else {
$html = substr($html, $start, $len - $start);
}
libxml_use_internal_errors(true);
libxml_clear_errors();
$xml = simplexml_load_string($html);
return count(libxml_get_errors()) == 0;
}
}
Not sure what is going on there. I guess the "parent" job took over the log entry slot and overwrote the message ?
We had an issue today where invalid html (a closing div tag was missing) was pushed to our sinatra-based website. It uses Nokogiri, which is very fault intolerant and, not seeing the missing div, removed the faulty container, thus creating a visual havoc on our public frontend.
I am looking at a potential way to spot this earlier in the chain and am wondering whether Simply Static could perhaps do it, since if I understand correctly, it renders the html using SimpleHtmlDom.
Is there a hook I could use to ask Simply Static to stop whenever an invalid html is detected, and issue a message with the url containing the invalid html code ?
Patrick by email told me:
I made a simple function that checks whether the html is well formed (all open tags are closed).
Would it be okay for you to add a filter so that I can hook into it, so that I could do something like "interrupt the static process and add a final message with a description of the issue (url of the invalid html) in the SSP Console ?
When this works, I can definitely contribute this as a gist so you can showcase it on the plugin website. I feel it would solve a real business need if the Simply Static static generation process is able to stop when potential website defacing is pushed. Especially in the age of CI/CD, in a WordPress ecosystem full of plugins of various quality...
Thank you for your support!