alvations / pywsd

Python Implementations of Word Sense Disambiguation (WSD) Technologies.
MIT License
743 stars 134 forks source link

Using Pywsd in other languages (french, or others) #29

Open GMarzinotto opened 7 years ago

GMarzinotto commented 7 years ago

Good afternoon,

I was wondering if it would be possible to adapt this tool to other languages such as French or Spanish. If it is feasible, could you give me some indications on how to do these modifications?

Thank you very much!

alvations commented 7 years ago

@GMarzinotto it's a good suggestion. But the bulk of the pywsd still relies on the various lesk algorithm. To extend the code to other languages we have to first get translations for the glosses (i.e. definitions) for every Synset.

geekan commented 7 years ago

@alvations Question: how to generate the Synset and glosses?

alvations commented 7 years ago

@geekan Do you mean "how to generate synset and glosses" for other languages? If so, then the first thing is to take synsets from Open Multilingual Wordnet (OMW) that maps the synsets from the other languages to the princeton IDs then somehow translate the glosses from English to the other languages (manually/automatically).

Just a straw poll, is scaling to other languages more important to implementing more state-of-art algorithms for English? If so, BabelNet would be an option, but it's non-commerical =(