Unbreak Page Preview Feature: WMF broke Page Previews by making PCS private

freephile commented 1 week ago

The Page Preview feature (also called 'Popups') gives a thumbnail representation of a page when the reader hovers the mouse over a hyperlink to a wiki article. WMF took this feature out of the (internal) MediaWiki REST API and created a separate Wikimedia REST API called RESTBase PCS (Page Content Service) which is only available to WMF projects - not third-party wikis.

Because of this, independent wikis have started to demonstrate a loss of functionality going back to ~2022 and MediaWiki 1.38

In 2024, using MediaWiki 1.39 the Text Extracts extension returns the output of the Parser Limit Reporting HTML comment. Even if you turn off Parser Limit Reporting, there is still nominal output and it's the only thing "found" by the Text Extracts extension. (details below)

For most links, Text Extracts generates a misleading error message "There was an issue displaying this preview" when it really means that the extract is empty.

The endpoint in PCS needed for Page Previews is the 'page summary'.

RfC: OpenGraph descriptions in wiki pages 2019
Add hover-card like summary (og:description) to open graph meta data printing plain summary and headline property in the SameAs schema 2016
[EPIC] Port page summary endpoint to MW 2024
TextExtracts extension: Code stewardship review aka Text Extracts is unmaintained. 2020
RESTBase
RESTBase deprecation 2019
RESTBase gerrit repo
RESTBase github clone

freephile commented 1 week ago

Popups Gadget as alternative

Available for WMF projects ONLY hostname is hardcoded into the Gadget as a list of WMF projects and interwiki links, and aspects like 'list of authors' rely on more WMF-only tools.

~~As an alternative workaround, consider the Popups Gadget~~

freephile commented 1 week ago

Tried these configs to no avail while researching the state of brokenness.

wfLoadExtension( "PageImages" );
wfLoadExtension( "EventLogging" );
wfLoadExtension( "Popups" );
// Don't hide the preference
$wgPopupsHideOptInOnPreferencesPage = false;
$wgPopupsOptInDefaultState = '1';
$wgPopupsReferencePreviewsBetaFeature = false;
// turn off Parser HTML comment
$wgEnableParserLimitReporting = false;
// Use as much as possible for Text Extracts
$wgPopupsTextExtractsIntroOnly = false;
// Do not avoid certain pages and tags
$wgExtractsRemoveClasses =  [];
// Use the TextExtracts extension for PopUps
$wgPopupsGateway = 'mwApiPlain';

freephile commented 5 days ago

Surprisingly, using an empty array for $wgExtractRemoveClasses did not work (silently ignored), while setting it to an empty string does work to override the value provided in Extension.json.

In order to see the effect of any code change, I had to restart php-fpm so that PHP changes would be interpreted, and also restart memcache so that new values for the extract would be computed.

TLDR;

Add any of the lines below to the TextExtracts extension configuration in 'LocalSettings.php' (or equivalent file).

$wgExtractsRemoveClasses = '';
$wgExtractsRemoveClasses = null;
$wgExtractsRemoveClasses = false;

freephile commented 5 days ago

TextExtracts Code changes between 1.35 and 1.39

https://github.com/wikimedia/mediawiki-extensions-TextExtracts/compare/wmf/1.35.0-wmf.30...wmf/1.39.0-wmf.28

New changes since 1.39 to master

https://github.com/wikimedia/mediawiki-extensions-TextExtracts/compare/wmf/1.39.0-wmf.28...master

freephile commented 5 days ago

Parser Limit report

In the page source, there is an HTML comment output like the following:

<!-- 
NewPP limit report
Complications: []
[SMW] In‐text annotation parser time: 0 seconds
-->

And that's when the feature is turned OFF with $wgEnableParserLimitReporting = false;

If the report were actually generated, it would look more like this:

<!-- 
NewPP limit report
Cached time: 20241107144457
Cache expiry: 86400
Reduced expiry: false
Complications: []
[SMW] In‐text annotation parser time: 0 seconds
CPU time usage: 0.112 seconds
Real time usage: 0.114 seconds
Preprocessor visited node count: 1/1000000
Post‐expand include size: 0/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 1/100
Expensive parser function count: 0/100
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 0/5000000 bytes
ExtLoops count: 0/100
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00%    0.000      1 -total
-->

Regardless of the content, it is the only content picked up by the TextExtracts extension without overriding the $wgExtractsRemoveClasses config.

IOW, you get an HTML comment (not visible to the user)

                "extract": "<!-- \nNewPP limit report\nCached time: 20241107144825\nCache expiry: 86400\nReduced expiry: false\nComplications: []\n[SMW] In\u2010text annotation parser time: 0 seconds\nCPU time usage: 0.178 seconds\nReal time usage: 0.191 seconds\nPreprocessor visited node count: 1/1000000\nPost\u2010expand include size: 0/2097152 bytes\nTemplate argument size: 0/2097152 bytes\nHighest expansion depth: 1/100\nExpensive parser function count: 0/100\nUnstrip recursion depth: 0/20\nUnstrip post\u2010expand size: 0/5000000 bytes\nExtLoops count: 0/100\n-->\n<!--\nTransclusion expansion time report (%,ms,calls,template)\n100.00%    0.000      1 -total\n-->"

instead of a usable visible extract like this:

"extract": "<div class=\"mw-parser-output\"><div class=\"thumb tright\"><div class=\"thumbinner\" style=\"width:302px;\">  <div class=\"thumbcaption\"><div class=\"magnify\"></div>Inspired by the Free Software Foundation</div></div></div>\n<p>Free Software was started by all the original hackers who invented the first computers and the programs to run those computers. Ideas and information were shared freely, because that made sense. It was the best way to learn and make progress. It had been that way since the dawn of time.\n</p><p>Then corporate interests stepped in, creating a tremendous impedence to progress, poorer technology choices, the richest man in the world, and massive monopolies of power and information, all in less than 25 years.\n</p><p>By the late 80's, one man started a new idea: Dr. Richard M. Stallman from MIT. RMS started the Free Software Foundation, and invented a legal mechanism using copyright to protect your freedom. This is called the General Public License (</p></div>..."

from wiki/api.php?action=query&prop=extracts&exchars=1000&titles=Free%20Software

freephile commented 5 days ago

Note about Page Images

If doing a wiki migration (e.g. creating a new wiki from a DB dump) instead of starting from scratch, you will need to initialize the "Page Images" property for all your existing pages.

For instance, we ran the following on a Meza migration of the 'wiki' wiki. WIKI=wiki php /opt/htdocs/mediawiki/extensions/PageImages/maintenance/initImageData.php

freephile commented 3 days ago

This isn't really fixed - yet.

Not satisfied with unsetting $wgExtractsRemoveClasses because it only masks the problem and results in poor extracts that contain text that should be excluded, I setup debugging and traced the code execution to see exactly what's happening.

When you attempt to add/edit $wgExtractsRemoveClasses, the defaults provided by the extension are always retained. Testing reveals that the removal of the div tag is largely responsible for causing blank extracts for pages that should have extracts.

These are the default removals.

freephile / meza