attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.76k stars 969 forks source link

Never finishes and even debug gets stuck in a loop #309

Open number435398 opened 1 year ago

number435398 commented 1 year ago

Ran it with the 4-2023 wikibooks xml file and after it outputs about

WARNING: Template errors in article 'Précis of epistemology/ZFC is false' (424592): title(1) recursion(0, 0, 0)

It stops outputting. If I set debug it keeps giving the following output repeatedly:

==Usage==
A close read of the shows the linking string given as <code><nowiki>{{{1}}} or URL= or url=</nowiki></code> is the subpage address after the TrainzWiki Help TOC-page listing N3V help articles. 
* Like most every Trainz template it may be given the pretty-text parameter ''''|p=''' ''some string'' which forms a priority output, otherwise, the template is designed to replace underscores in a url with spaces and display the page title, just as if it were a local page link on any Wikimedia based Wiki project.
* It is also designed for placeholder parameter '''<code><nowiki>{{{2}}}</nowiki></code>''' to support the more convenient pipe-separated call normally seen in wikimarkup language:
:: <code><nowiki>{{HT|</nowiki>''some_underscored_url_pagename'' '''|'''''Some pretty name'''<nowiki>}}'''</nowiki> </code>
For this template, since the subpage names AND the Trainz Wiki divisions are named for topic clarity, the shorter form using the page name is likely to be what is desired:
:: <code><nowiki>{{HT|</nowiki>''some_underscored_url_pagename''<nowiki>}}</nowiki></code>
<br>{{#if:{{{inhibit|1}}}||<br><br>}}
</noinclude><span class="plainlinks">[http://online.ts2009.com/mediaWiki/index.php/HowTo/{{{1|{{{URL|{{{url|}}}}}}}}} {{{p|{{{2|{{#replace:|{{{1|}}}|_| }}}}}}}}]</span>
DEBUG: subst tpl (30, 4) URL
DEBUG: subst tpl (30, 1) 1
DEBUG: subst tpl (29, 1) {{{2|{{#replace:|{{{1|{{{URL|{{{url|}}}}}}}}}|_| }}}}}
DEBUG: <templateParams: </nowiki>''some_underscored_url_pagename''<nowiki>
DEBUG: subst tpl (29, 1) 1
DEBUG: subst tpl (30, 4) {{{url|}}}
DEBUG: subst tpl (30, 1) 2
DEBUG: subst tpl (29, 2) 2
DEBUG: subst tpl (30, 5) url
DEBUG:    templateParams> </nowiki>''some_underscored_url_pagename''<nowiki>
DEBUG: subst tpl (30, 1) inhibit
DEBUG: subst tpl (29, 1) p
DEBUG: subst tpl (30, 5) 
DEBUG: subst tpl (29, 1) 1
DEBUG: subst tpl (30, 1) 1
DEBUG: subst tpl (29, 1) {{{2|{{#replace:|{{{1|{{{URL|{{{url|}}}}}}}}}|_| }}}}}
DEBUG: subst tpl (30, 1) 1
DEBUG: subst tpl (30, 1) 1
DEBUG: subst tpl (29, 1) 2
DEBUG: subst tpl (29, 0)   template forms an external link to an N3V Wiki HELP namespace page given only the page title. 
:* It add NS and HowTo parameters to link to Help namespace like:
http://online.ts2009.com/mediaWiki/index.php/Help:Surveyor_Tools 
&nbsp;
;internal coding
<pre>
<span class="plainlinks">[http://online.ts2009.com/mediaWiki/index.php/HowTo/{{{1|{{{URL|{{{url|}}}}}}}}} {{{p|{{{2|{{#replace:|{{{1|{{{URL|{{{url|}}}}}}}}}|_| }}}}}}}}]</span>
</pre>