dkpro / dkpro-jwpl

DKPro JWPL (DKPro Java Wikipedia Library) is a free, Java-based application programming interface that facilitates access to all information in Wikipedia.
https://dkpro.github.io/dkpro-jwpl
Apache License 2.0
81 stars 34 forks source link

de.tudarmstadt.ukp.wikipedia.parser.Link.getText may return empty string #90

Open daxenberger opened 8 years ago

daxenberger commented 8 years ago

Originally reported on Google Code with ID 96

I noticed that when a page has categories as follows, getText() will return an empty
string. Take for example, the 'Anarchism' page. It has six categories defined in its
wikitext:
[[Category:Anarchism| ]]
[[Category:Political culture]]
[[Category:Political ideologies]]
[[Category:Social theories]]
[[Category:Anti-fascism]]
[[Category:Greek loanwords]]

The following code 
for (Link link : page.getCategories()) {
  System.out.println(">" + link.getText() + "<");
}

will print:
><
>Category:Political culture<
>Category:Political ideologies<
>Category:Social theories<
>Category:Anti-fascism<
>Category:Greek loanwords<

Note the first line. We get an empty text because the string after the | character
is empty.

I suggest that in such a case, we return the category "target" itself or the target
without the "Category:" string.

What version of the product are you using? On what operating system?
Running latest release (0.9.1) on Linux.

Reported by jbabooa on 2012-05-20 12:28:29

daxenberger commented 8 years ago
more of a request for enhancement..

Reported by jbabooa on 2012-05-20 12:28:59

daxenberger commented 8 years ago
Thanks for the report. I will look into it and make the suggested change.

However, be aware that as of the next release of JWPL, the parser will not be supported
any more. It has been moved into its own module.
We will still apply patches provided by the community, but we will not develop the
parser any further.
We now use the Sweble parser (www.sweble.org), which we also integrated into JWPL Core.

Reported by oliver.ferschke on 2012-05-29 10:17:10

daxenberger commented 8 years ago

Reported by oliver.ferschke on 2012-05-29 10:23:00