eccsup / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

Parser fails when getPlainText() called on [[Table of mathematical symbols]] #115

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The parser fails when getPlainText() is called on the page [[Table of 
mathematical symbols]] from English Wikipedia. Or at least, it crashes on the 
22 August 2009 version I'm using–though today's version looks much the same.  
This page uses lots of nested templates to produce a table; the parser seems to 
be getting stuck on template lines such as 

{{row of table of mathematical symbols
 | symbol   ={{Unicode|∓}}

Here is a minimal example reproducing the problem:

DatabaseConfiguration dbc = new DatabaseConfiguration();
dbc.setHost("bender");
dbc.setDatabase("wikiapi_en_20090822");
dbc.setUser("myusername");
dbc.setPassword("mypassword");
dbc.setLanguage(Language.english);
new Wikipedia(dbc).getPage("Table of mathematical symbols").getPlainText();

Output:
2013-06-28 15:30:41  INFO [main] (Wikipedia) - Creating Wikipedia object.
2013-06-28 15:30:44 ERROR [main] (Compiler) - Parsing failed!
ParserShouldNotBeHereException: Table_of_mathematical_symbols:217:1: <!> " | 
symbol   ={{U"
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.throwShouldNotBeHereException(LazyRatsParser.java:18460)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pThrowShouldNotBeHereException(LazyRatsParser.java:12783)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pShouldNotBeHere(LazyRatsParser.java:12746)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowElement$$Choice1(LazyRatsParser.java:9638)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowElement(LazyRatsParser.java:9531)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowContentStar(LazyRatsParser.java:9451)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowBody(LazyRatsParser.java:9365)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowTransient(LazyRatsParser.java:9303)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowMemoized$1(LazyRatsParser.java:9247)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowMemoized(LazyRatsParser.java:9233)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRow(LazyRatsParser.java:9198)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableElement(LazyRatsParser.java:6975)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableContentStar(LazyRatsParser.java:6907)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableBody(LazyRatsParser.java:6717)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableTransient(LazyRatsParser.java:6667)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableMemoized$1(LazyRatsParser.java:6632)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableMemoized(LazyRatsParser.java:6618)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTable(LazyRatsParser.java:6584)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pLineStartProd(LazyRatsParser.java:642)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphStopper(LazyRatsParser.java:985)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.p$$Shared1(LazyRatsParser.java:1129)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pPreParaWs(LazyRatsParser.java:1075)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphTransient(LazyRatsParser.java:864)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphMemoized$1(LazyRatsParser.java:826)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphMemoized(LazyRatsParser.java:812)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraph(LazyRatsParser.java:777)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBlock(LazyRatsParser.java:531)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBlockContent(LazyRatsParser.java:462)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionContentStar(LazyRatsParser.java:5588)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSection(LazyRatsParser.java:5309)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionsTransient(LazyRatsParser.java:5231)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionsMemoized$1(LazyRatsParser.java:5193)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionsMemoized(LazyRatsParser.java:5179)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSections(LazyRatsParser.java:5144)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pLineStartProd(LazyRatsParser.java:744)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBlock(LazyRatsParser.java:521)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBolBlockContent(LazyRatsParser.java:412)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pArticle(LazyRatsParser.java:168)
    at org.sweble.wikitext.lazy.LazyParser.parseArticle(LazyParser.java:76)
    at org.sweble.wikitext.engine.Compiler.parse(Compiler.java:661)
    at org.sweble.wikitext.engine.Compiler.postprocess(Compiler.java:290)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getCompiledPage(Page.java:621)
    at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:600)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:581)
    at de.tudarmstadt.ukp.alignment.JWPLBug.main(JWPLBug.java:23)
Exception in thread "main" 
de.tudarmstadt.ukp.wikipedia.api.exception.WikiApiException: 
org.sweble.wikitext.engine.CompilerException: Parsing failed!
    at de.tudarmstadt.ukp.wikipedia.api.Page.getCompiledPage(Page.java:623)
    at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:600)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:581)
    at de.tudarmstadt.ukp.alignment.JWPLBug.main(JWPLBug.java:23)
Caused by: org.sweble.wikitext.engine.CompilerException: Parsing failed!
    at org.sweble.wikitext.engine.Compiler.parse(Compiler.java:684)
    at org.sweble.wikitext.engine.Compiler.postprocess(Compiler.java:290)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getCompiledPage(Page.java:621)
    ... 3 more
Caused by: ParserShouldNotBeHereException: Table_of_mathematical_symbols:217:1: 
<!> " | symbol   ={{U"
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.throwShouldNotBeHereException(LazyRatsParser.java:18460)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pThrowShouldNotBeHereException(LazyRatsParser.java:12783)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pShouldNotBeHere(LazyRatsParser.java:12746)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowElement$$Choice1(LazyRatsParser.java:9638)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowElement(LazyRatsParser.java:9531)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowContentStar(LazyRatsParser.java:9451)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowBody(LazyRatsParser.java:9365)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowTransient(LazyRatsParser.java:9303)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowMemoized$1(LazyRatsParser.java:9247)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRowMemoized(LazyRatsParser.java:9233)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableRow(LazyRatsParser.java:9198)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableElement(LazyRatsParser.java:6975)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableContentStar(LazyRatsParser.java:6907)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableBody(LazyRatsParser.java:6717)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableTransient(LazyRatsParser.java:6667)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableMemoized$1(LazyRatsParser.java:6632)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTableMemoized(LazyRatsParser.java:6618)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pTable(LazyRatsParser.java:6584)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pLineStartProd(LazyRatsParser.java:642)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphStopper(LazyRatsParser.java:985)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.p$$Shared1(LazyRatsParser.java:1129)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pPreParaWs(LazyRatsParser.java:1075)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphTransient(LazyRatsParser.java:864)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphMemoized$1(LazyRatsParser.java:826)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraphMemoized(LazyRatsParser.java:812)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pParagraph(LazyRatsParser.java:777)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBlock(LazyRatsParser.java:531)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBlockContent(LazyRatsParser.java:462)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionContentStar(LazyRatsParser.java:5588)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSection(LazyRatsParser.java:5309)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionsTransient(LazyRatsParser.java:5231)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionsMemoized$1(LazyRatsParser.java:5193)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSectionsMemoized(LazyRatsParser.java:5179)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pSections(LazyRatsParser.java:5144)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pLineStartProd(LazyRatsParser.java:744)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBlock(LazyRatsParser.java:521)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pBolBlockContent(LazyRatsParser.java:412)
    at org.sweble.wikitext.lazy.parser.LazyRatsParser.pArticle(LazyRatsParser.java:168)
    at org.sweble.wikitext.lazy.LazyParser.parseArticle(LazyParser.java:76)
    at org.sweble.wikitext.engine.Compiler.parse(Compiler.java:661)
    ... 5 more

What version of the product are you using? On what operating system?
1.4.0

Original issue reported on code.google.com by tristan.miller@nothingisreal.com on 28 Jun 2013 at 1:35

GoogleCodeExporter commented 9 years ago
This error is thrown by the underlying Sweble parser.
It would be best to open an issue there:
https://github.com/sweble/sweble-wikitext/issues

Original comment by torsten....@gmail.com on 30 Jun 2013 at 2:21