Closed yves-renier closed 1 year ago
The issue arises from the syntax error in ===''The Times They Are A-Changin''' sessions, part 1===
(two opening quotes, three ending quotes), which makes mwparserfromhell
parse it wrong. This is a known issue of this package. Most likely in Wikipedia they have some sort of exception for these cases so that they still display correctly, but it can easily become a rabbit hole to try to account ourselves for these cases.
Still, I will try a few things to see if I can circumvent this particular case.
Update: Just noticed the title of the album is The Times They Are A-Changin'
, with a trailing single quote 🤦♀️, so this is not (only) a syntax error...
Fixed in 96ab0aa.
The workaround I implemented is to check if headings have an odd number of single quotes ('
) and, in that case, replace the italics opening and closing wikitext tags (''
) with the HTML tags (<i>
and </i>
) around dropped gs in -ing endings before parsing anything. These tags are kept in the parsed HTML but removed all the same in any other field, in particular in the headings list.
It is not the cleanest solution perhaps, but we cannot account for all these cases, and this workaround does solve this particular case for now. We can update this solution to include other such cases if needed when needed.
The extracted heading contains a heading which include all text from:
The Times They Are A-Changin sessions, part 2===
to===Saved'' sessions