5j9 / wikitextparser

A Python library to parse MediaWiki WikiText
GNU General Public License v3.0
285 stars 22 forks source link

Getting IndexError from plain_text of WikiList when parameter replace_templates is callable #130

Closed jstzwj closed 6 months ago

jstzwj commented 6 months ago

I found that the plain_text method of WikiList throws a IndexError if I pass a function as parameter replace_templates. Here is my code and the wikitext:

import json
import wikitextparser
def fn_replace_templates(template: wikitextparser.Template):
    return template.name
def read_json(file_path: str):
    with open(file_path, 'r', encoding="utf-8") as file:
        data = json.load(file)
    return data
if __name__ == "__main__":
    with open("mini.txt", "r", encoding="utf-8") as f:
        content = f.read()
    data = wikitextparser.parse(content)
    sections = data.sections[2].get_sections(include_subsections=False)
    meanings_list = sections[1].get_lists()
    plaintext = meanings_list[0].plain_text(replace_templates=fn_replace_templates)
    print(plaintext)

mini.txt:

"{{also|monð}}
==English==
{{wikipedia}}

===Noun===
{{en-noun|s|month|pl2qual=rare}}

# A [[period]] into which a [[year]] is divided, historically based on the phases of the moon.
#: {{ux|en|July is my favourite '''month'''.}}

It seems to be related to the inconsistency between the span of WikiList and the span of Template. I found that the lst is counted from the start postion of WikiList, while the template is counted from the beginning of the entire text. If I subtract the offset of the WikiList start position from the start and end fields of the template span, it should work correctly. However, I'm not sure how to fix this in a more elegant way.

5j9 commented 6 months ago

Fix was released as v0.55.10.