Closed chmeyer closed 9 years ago
Hi. If you access the etymologies using IWiktionaryEntry.getWordEtymology(), you will
obtain a IWikiString representation of the etymology. This class provides both a getPlainText()
and a getText() method to obtain a string representation of the etymology. I assume
that in your code you used the former (or an implicit toString(), which uses getPlainText(),
too). The latter, however, allows you to work with the full markup encoded in Wiktionary.
And yes: getPlainText() is too eager for etymology strings. I'm not sure if a plain
text representation is necessary at all if you have the markup version. I have been
experimenting with a EtymologyTemplateHandler for a while - you can find it in the
api.util.TemplateParser file - using this methodology, it should be possible to analyze
etymology strings. It's far from perfect, but probably a good starting point. If you
make interesting changes to the JWKTL source code, I'm happy to integrate it. Just
reopen this ticket or start a new one. Best wishes!
Reported by chmeyer.de
on 2015-02-04 15:07:36
WontFix
Thanks Christian! I wasn't planning to make changes, I was just trying to
see how much of the info in WIktionary is formalized. I am allergic to
databases :)
all the best,
Vivi
Reported by nastase@fbk.eu
on 2015-02-04 15:16:12
Originally reported on Google Code with ID 11
Reported by
nastase@fbk.eu
on 2015-01-26 11:28:43