earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
741 stars 74 forks source link

strip_code method not returning bold and jp texts. #272

Closed mooncell07 closed 3 years ago

mooncell07 commented 3 years ago

req = (requests.get("https://typemoon.fandom.com/api.php?action=query&rvslots=main&rvlimit=1&titles=EMIYA_(Archer)&formatversion=2&format=json&rvprop=content&prop=revisions").json())["query"]["pages"][0]["revisions"][0]["slots"]["main"]["content"]

print("".join(mwparserfromhell.parse(req).strip_code().splitlines()[:3]))


- Expected Output:

EMIYA (エミヤ?), Class Name Archer (アーチャー, Āchā?), is the Archer-class Servant of Rin Tohsaka in the Fifth Holy Grail War of Fate/stay night.
He is one of the Servants of Ritsuka Fujimaru of the Grand Orders conflicts of Fate/Grand Order.

- The output above given code returned:

, Class Name , is the Archer-class Servant of Rin Tohsaka in the Fifth Holy Grail War of Fate/stay night.He is one of the Servants of Ritsuka Fujimaru of the Grand Orders conflicts of Fate/Grand Order.
earwig commented 3 years ago

This is because strip_code() strips templates by default and that page is using templates to produce that text. You can use strip_code(keep_template_params=True) to avoid this.

mooncell07 commented 3 years ago

Thanks It works now!