FlominatorTM / wikiblame

http://wikipedia.ramselehof.de/wikiblame.php
GNU General Public License v3.0
54 stars 13 forks source link

Revision history not filtered correctly #52

Open kidhanis opened 3 months ago

kidhanis commented 3 months ago

Observed behavior

I tried wikiblame on Main Page in 2016 and found problems with filtering revision history.

  1. The variable $is_deleted_revision can be false for deleted revisions with username redacted. This causes the $versions array to contain revisions that won't return a valid $id in idfromurl(). Here are the 20 revisions from Main Page used on the query. The class mw-userlink is normally on the username a element, like on the first and last revisions from this list. However, on the third revision, which has Username or IP removed instead of a username, the class is found on a span element. The wikiblame query from above shows 19 versions found, so the only exclusion is the last revision, which is the only redacted revision with a linked username.

  2. The setting "Ignore minor changes" is not working, as line 475 on wikiblame.php looks for a string that does not exist. Running the same query from above but with this box checked still results in the same 19 versions found. The current element starting with <abbr class="minoredit" might be the one to target instead.

Possible solution

The API can simplify the code for filtering revisions, although the limit per request is 500 revisions. Here's how it could look:

"https://".$server."/w/api.php?action=query&prop=revisions&rvlimit=$limit&rvstart=$offset&titles=".$articleenc."&rvprop=ids|timestamp|flags|sha1&format=json&formatversion=2"

See Main Page revision using API. The rvprop field has the flags value to include a minor boolean in the output, and the sha1 value to include a sha1hidden boolean in the output whenever the revision content was redacted.