PiRSquared17 / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 1 forks source link

Title constructor mangles pages titles with parentheses #81

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The Wikipedia.getPage() method silently mangles certain page titles passed to 
it, such that it looks up an entirely different page.  Specifically, all page 
titles matching the regular expression ^(.*)(.)(\(.+\))$ are affected: group(2) 
gets replaced with an underscore.  For example, a call to getPage("abc(def)") 
will fail with the following exception, even if a page with that title actually 
exists in the database:

Exception in thread "main"
de.tudarmstadt.ukp.wikipedia.api.exception.WikiPageNotFoundException: No page 
with name Ab_(def) was found.
    at de.tudarmstadt.ukp.wikipedia.api.Page.fetchByTitle(Page.java:186)
    at de.tudarmstadt.ukp.wikipedia.api.Page.<init>(Page.java:111)
    at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.getPage(Wikipedia.java:109)
    at WordNetPlusPlus.main(WordNetPlusPlus.java:42)

I noticed this bug when trying to fetch the article "401(k)".  It's impossible 
to because JWPL looks for the nonexistent "40_(k)" instead.

The bug appears to be in the constructor Title(String), which uses the regular
expression (.*?).\((.+?)\)$ to parse the string.  The lonely "." before the
parenthesis isn't assigned to a group, and so gets deleted.

Original issue reported on code.google.com by tristan.miller@nothingisreal.com on 20 Feb 2012 at 9:33

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 20 Feb 2012 at 9:37

GoogleCodeExporter commented 9 years ago
Attached patch seems to fix the problem.  I added a new JUnit test case for the 
"401(k)" article title as well.

Original comment by tristan.miller@nothingisreal.com on 20 Feb 2012 at 9:59

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you. I will include the patch when I'm back in Germany.

Original comment by oliver.ferschke on 1 Mar 2012 at 1:24

GoogleCodeExporter commented 9 years ago
Issue 87 has been merged into this issue.

Original comment by oliver.ferschke on 22 Apr 2012 at 10:52

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r577.

Original comment by oliver.ferschke on 9 May 2012 at 9:52

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 9 May 2012 at 9:53

GoogleCodeExporter commented 9 years ago
Issue 93 has been merged into this issue.

Original comment by oliver.ferschke on 16 May 2012 at 10:07

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 15 Aug 2012 at 9:21