dkpro / dkpro-jwktl

Java Wiktionary Library
http://dkpro.org/dkpro-jwktl/
Apache License 2.0
57 stars 25 forks source link

Polish word forms #44

Open kurmasz opened 7 years ago

kurmasz commented 7 years ago

When analyzing the English Wikitionary, getWordForm() always returns null for IWiktionaryEntry objects with a language of "Polish" --- even if the corresponding entry has a declension/conjugation table. Is this the expected behavior?

(When trying to analyze the Polish wiktionary, I got "Exception in thread "main" de.tudarmstadt.ukp.jwktl.api.WiktionaryException: Language Polish is not supported", so I assume that behavior is expected.)

Tbsc commented 7 years ago

(Note: I'm not related in any way to this project, and this is just what I understood from looking at the code. And yes, I know April was more than 4 months ago.)

Regarding your first question, word forms in the English Wiktionary are handled by ENWordFormHandler, which can only handle English entries (There is ENNonEngWordFormHandler for non-English entries, but it only handles noun genders, not word forms). Theoretically, supporting other languages is possible, but because the library doesn't let you register external handlers and only gives you the parsed values (no way of getting the wikitext), I'm pretty sure that's not possible (I encountered the same problem, and it's frustrating).

And yes, only the English, German and Russion Wiktionaries are supported (WiktionaryArticleParser, onSiteInfoComplete(), it only handles those Wiktionary languages)

Edit: I just realized how external handlers aren't really possible because the parser only saves the parsed values to the database, so it's not that the wikitext is hidden, but rather that it doesn't exist. Getting those values requires modifying the library to include those values in the database and to rebuild it.