PiRSquared17 / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 1 forks source link

StringIndexOutOfBoundsException when parsing a specific Wikipedia Page #63

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run DataMachine to populate the wikipedia database; using the dump from 
2011-11-15
2. Attempt to get the parsed textual content from the article on Conditional 
Comment like this:

Page page = wiki.getPage("Conditional_comment");
ParsedPage parsedPage = page.getParsedPage();
System.out.println(parsedPage.getText());

What is the expected output? What do you see instead?
I expect to see the Wiki markup converted into plain text. Instead I see an 
exception:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
index out of range: -1
    at java.lang.AbstractStringBuilder.substring(Unknown Source)
    at java.lang.StringBuilder.substring(Unknown Source)
    at de.tudarmstadt.ukp.wikipedia.parser.mediawiki.SpanManager.substring(SpanManager.java:155)
    at de.tudarmstadt.ukp.wikipedia.parser.mediawiki.ModularParser.parseSpecifiedTag(ModularParser.java:898)
    at de.tudarmstadt.ukp.wikipedia.parser.mediawiki.ModularParser.parseSpecifiedTag(ModularParser.java:862)
    at de.tudarmstadt.ukp.wikipedia.parser.mediawiki.ModularParser.parse(ModularParser.java:356)
What version of the product are you using? On what operating system?

Please provide any additional information below.
I have attached a modification of the Demo program; it contains the Wiki markup 
of the page in question. You can run this program; it exhibits the same 
behavior.

Original issue reported on code.google.com by robertop...@gmail.com on 8 Dec 2011 at 6:35

Attachments:

GoogleCodeExporter commented 9 years ago
I forgot to add: Using the JAR file de.tudarmstadt.ukp.wikipedia.api-0.7.2.jar
On MSW (Windows 7)

Original comment by robertop...@gmail.com on 8 Dec 2011 at 6:37

GoogleCodeExporter commented 9 years ago
This issue is fixed in the current SNAPSHOT version (0.8.0-SNAPSHOT). 
I plan to make a new release soon.
If you need the new version urgently, you can check it out an build it on your 
own. If that's not an option, I could also send you a jar.

Original comment by oliver.ferschke on 12 Dec 2011 at 9:27

GoogleCodeExporter commented 9 years ago
I just created the JWPL 0.8.0 release.
This should solve your problems.

Original comment by oliver.ferschke on 12 Dec 2011 at 11:51

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 16 Feb 2012 at 1:19