Ron89 / thesaurus_query.vim

Multi-language Thesaurus Query and Replacement plugin for Vim/NeoVim
http://www.vim.org/scripts/script.php?script_id=5341
Apache License 2.0
220 stars 23 forks source link

Add wiktionary as backend #9

Closed petRUShka closed 7 years ago

petRUShka commented 8 years ago

There is Wiktionary:

It is a multilingual, web-based project to create a free content dictionary of all words in all languages. It is available in 172 languages and in Simple English. Because Wiktionary is not limited by print space considerations, most of Wiktionary's language editions provide definitions and translations of words from many languages, and some editions offer additional information typically found in thesauri and lexicons.

In that case it will be support for 172 languages. I'm not sure about quality, but it can be better for Russian because jerk.ru isn't perfect (especially for verbs...).

Ron89 commented 8 years ago

Sorry for the ill performance of the backend. And thanks for the information. However, according to Wiktionary, Whether Wikisaurus at en.wiktionary.org should contain entries for other languages than English is currently undecided or disputed. Entries for foreign languages could look like Wikisaurus:příbuzný and Wikisaurus:juoppo. And when I type in url https://en.wiktionary.org/wiki/Wikisaurus:Wikisaurus:ру́сский,Wiktionary returns that the Wikisaurus page was not yet created.

Also, when I search for Russian words in Russian wiktionary, I saw catagory Синонимы in the index, but no synonym in it. So I believe they haven't started Russsian thesaurus service yet.

Oh, I think I should mention the issue of word stemming, too. It poses issue since Russian's stemming is much more complicated than English and many words in articles are in their derivative form. Since I am not sure if adding local Natural language processing dependencies(Python's NLTK library) is a good idea(e.g. in English, moving and move has very different meaning and thesaurus set), so currently I am still relying on service providers to do the word stemming for the plugin. Wiktionary doesn't provide built-in word stemming functionality, which could pose issues if I make it into a backend.

Ron89 commented 7 years ago

Not implementable. Closing.