GrammaticalFramework / gf-wordnet

A WordNet in GF
https://cloud.grammaticalframework.org/wordnet/
24 stars 11 forks source link

Russian #31

Closed harisont closed 9 months ago

harisont commented 2 years ago

Hi!

Would it be possible to add Russian in the very near future, and if so how long would that take approximately? Or how could I do it myself?

I'm trying to adapt some GF-based grammar exercises to Russian for this beginner Swedish course directed to Ukrainian refugees and the GF Wordnet would make everything much easier.

krangelov commented 2 years ago

Hi Arianna,

Building a Russian WordNet itself is quite easy and I really want it done. The problem is to import a morphological lexicon. There is a really good external resource:

https://gramdict.ru/

Unfortunately the syntax of the data is very complex. I read the instructions but it looks like parsing it completely automatically is not going to work. There is a work in progress for parsing the data here:

https://github.com/gramdict/zalizniak-2010-waxeye-grammar

This would be the first thing to try. Probably we don't even have to import all the data, something like ~30-40k words might be good enough. If you can help with bootstrapping the morphology then I can quite easily build the WordNet. My Russian is quite rusty but it would be enough to even do some basic validation of the result.

Best Regards, Krasimir

On Fri, 29 Apr 2022 at 11:39, Arianna Masciolini @.***> wrote:

Hi!

Would it be possible to add Russian in the very near future, and if so how long would that take approximately? Or how could I do it myself?

I'm trying to adapt some GF-based grammar exercises to Russian for this beginner Swedish course directed to Ukrainian refugees https://github.com/elenavolodina/SwedishFromScratch and the GF Wordnet would make everything much easier.

— Reply to this email directly, view it on GitHub https://github.com/GrammaticalFramework/gf-wordnet/issues/31, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYFSZDHQQOKPPEUBJVFWC3VHOU55ANCNFSM5UVLCD5Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

harisont commented 2 years ago

Thank you! As I was saying yesterday, I will try to take a look at this after a first version of the program for the exercises is up and running.

krangelov commented 2 years ago

Now there is a Russian WordNet. I checked a few hundred items and the quality is pretty good. Of course there are mistakes as well. I guess it helps a lot that the algorithms can use Bulgarian as a pivot language.