Closed ryelle closed 10 months ago
Good question, I also spent some time tracking down where the excerpt is generated. The import parsing starts around here, and on line 490 the first sentence of markdown content is used. Maybe it was assumed it will always be plain text? Only after that is the markdown content is parsed to HTML, and that's saved as the post content.
I suppose we could flip those, and pull out the first sentence after parsing as HTML, instead. Strip out the tags, then save it.
Ah nice, yeah that assumption sounds accurate.
I suppose we could flip those, and pull out the first sentence after parsing as HTML, instead. Strip out the tags, then save it.
Yeah imo it would be more efficient to save the expected content once, at import time, rather than on every page load.
Okay, I've made an update to wporg-markdown, so that the excerpt processing runs after the HTML parsing.
Index: inc/class-importer.php
===================================================================
--- inc/class-importer.php (revision 13104)
+++ inc/class-importer.php (working copy)
@@ -486,12 +486,6 @@
}
$markdown = trim( $markdown );
- // Use the first sentence as the excerpt.
- $excerpt = '';
- if ( preg_match( '/^(.+)/', $markdown, $matches ) ) {
- $excerpt = $matches[1];
- }
-
// Transform to HTML and save the post
$parser = new WPCom_GHF_Markdown_Parser();
$parser->preserve_shortcodes = false;
@@ -499,6 +493,12 @@
$html = apply_filters( 'wporg_markdown_after_transform', $html, $this->get_post_type() );
+ // Use the first line as the excerpt, but first strip any HTML.
+ $excerpt = '';
+ if ( preg_match( '/^(.+)/', wp_strip_all_tags( $html ), $matches ) ) {
+ $excerpt = $matches[1];
+ }
+
add_filter( 'wp_kses_allowed_html', [ $this, 'wp_kses_allow_links' ], 10, 2 );
$post_data = array(
You can re-parse all handbook pages by adding add_filter( 'wporg_markdown_check_etags', '__return_false' );
, then running yarn wp-env run cli "wp cron event run --all"
Okay, I've made an update to wporg-markdown, so that the excerpt processing runs after the HTML parsing.
I tried to update my local plugin but couldn't so I just applied this patch and it worked 👍
I don't see the change on trunk, assume you still need to ship it?
I don't see the change on trunk, assume you still need to ship it?
That's right, I added the diff here for "review" since that plugin is trac-based. I figured if that looks good, I'd ship that patch & close this PR.
Merged into wporg-markdown: https://meta.trac.wordpress.org/changeset/13119
Fixes #465 — The markdown rendered in the excerpt matches one of the legacy notice shortcodes (
[tutorial]
), which is causing the shortcode processing to output here. It's also unexpected that the markdown would output in the search results, excerpts are typically plain text. This update parses the markdown and strips the resulting HTML, so that these excerpts behave more like native WordPress excerpts.To test:
/block-editor/?s=deprecation
Try on the code reference too, there should be no change.