When parsing the string within the <text> element from a wikidump, with the plain_text(), the following error is displayed:
Error parsing text: 'NoneType' object has no attribute 'end'
Example
text = """Text: ''[https://<!---->{{#switch:{{{3|{{{type|movie}}}}}}<!-- the parameter"type"is"movie"by default -->|movie=movie.douban.com/subject/{{{1|{{{id|{{#if:{{#property:P4529}}|{{#property:P4529}}|}}}}}}}}|book=book.douban.com/subject/{{{1|{{{id|}}}}}}|music=music.douban.com/subject/{{{1|{{{id|}}}}}}|www.douban.com/{{{3|{{{type|}}}}}}/{{{1|{{{id|}}}}}}<!-- default -->}}/<!---->{{#if:{{{2|{{{title|}}}}}}|{{{2|{{{title|}}}}}}|{{PAGENAMEBASE}}}}]''<!-- the parameter"title"is the current Wikipedia page's title by default-->at [[Douban]] {{in lang|zh}}<includeonly>{{#switch:{{{3|{{{type|movie}}}}}}|movie={{EditAtWikidata|pid=P4529|{{{1|{{{id|}}}}}}}}{{#if:{{{1|{{{id|}}}}}}{{#property:P4529}}||{{main other|[[Category:Douban template with no id set]]}}}}|}}</includeonly><noinclude>{{Documentation}}</noinclude>"""
parsed = wtp.parse(text)
plain_text = parsed.plain_text()
Error parsing text: 'NoneType' object has no attribute 'end'
Could it be that this singular dump is just formatted wrong, or that this is an edge case?
Wikimedia dump
<text bytes="821" xml:space="preserve">
''[https://<!--
-->{{#switch:{{{3|{{{type|movie}}}}}}<!-- the parameter "type" is "movie" by default -->
|movie=movie.douban.com/subject/{{{1|{{{id|{{#if:{{#property:P4529}}|{{#property:P4529}}|}}}}}}}}
|book=book.douban.com/subject/{{{1|{{{id|}}}}}}
|music=music.douban.com/subject/{{{1|{{{id|}}}}}}
|www.douban.com/{{{3|{{{type|}}}}}}/{{{1|{{{id|}}}}}}<!-- default -->
}}/<!--
--> {{#if:{{{2|{{{title|}}}}}}|{{{2|{{{title|}}}}}}|{{PAGENAMEBASE}}}}]''<!-- the parameter "title" is the current Wikipedia page's title by default
--> at [[Douban]] {{in lang|zh}}<includeonly>{{#switch:{{{3|{{{type|movie}}}}}}|movie={{EditAtWikidata|pid=P4529|{{{1|{{{id|}}}}}}}}{{#if:{{{1|{{{id|}}}}}}{{#property:P4529}}||{{main other|[[Category:Douban template with no id set]]}}}}|}}</includeonly><noinclude>{{Documentation}}</noinclude>
</text>
When parsing the string within the
<text>
element from a wikidump, with theplain_text()
, the following error is displayed:Error parsing text: 'NoneType' object has no attribute 'end'
Example
Error parsing text: 'NoneType' object has no attribute 'end'
Could it be that this singular dump is just formatted wrong, or that this is an edge case?
Wikimedia dump