Closed kidhanis closed 3 months ago
The most future-proof solution would be using the API: https://en.wikipedia.org/w/api.php?action=parse&page=API&prop=text&disableeditsection=&formatversion=2 gives approximately the same result (including the removal of [bearbeiten]
links, but in all languages), but the output is generally stable.
Thanks for the issue @kidhanis and for the suggestion @tacsipacsi, which I implemented
I'm currently getting matches to JS code inside HTML
script
tags on English Wikipedia, and it's because$start_token
insidechop_content()
is not working. https://github.com/FlominatorTM/wikiblame/blob/64a254548d06d844ce435b58d039039e49abaeab/shared_inc/wiki_functions.inc.php#L318The article data now starts with
<div class="mw-content-ltr mw-parser-output"
, but there's also<div class="mw-content-rtl mw-parser-output"
on RTL scripts.