My name is Mukhammadsaid; I am a third-year Computer Science B.Sc. student at Inha University in Tashkent. This current issue bears little relevance to this repository, but I have to ask you for help.
As you know, Uzbek is a resource-poor language, although it is the second most spoken Turkic language after Turkish (approximately 34-35 mln speakers). That fact prompted me to create at least something for Uzbek, so I chose spell-checker. Later I discovered that Hunspell and other technologies might not work for Uzbek and the right way was to develop it using FSTs. So, I have started collecting all morphotactic rules from various sources and made MVP of morph analyzer UzMorph (in foma). I hired two linguists who annotated the lexicon for the analyzer. Following Oflazer, I found many similarities between Turkish and Uzbek, although Uzbek is detached from other Turkic languages due to the Persian influence. I can send you the paper draft that explains the morph analyzer if you want.
Then, I made a website and mobile keyboard Tahrirchi (Uzbek, "editor") for Uzbek that fully works on FSTs.
However, for further research and development the project needs funding. The current morph analyzer recognizes around 97-98% of words, so there's a room for improvement. Also, if we could make a treebank using the morph analyzer, we could create many other tools for Uzbek, such as a tagger.
The thing is, it seems implausible that I can get any funding from the government. I have tried all ways of proposing the project to the government, but my efforts bore no fruit. I wonder if there is a tiny chance that Google Research might be interested in the Uzbek language. I had to write issue here since I couldn't find the correct email address that considers such mails from Google Research. I would be more than grateful if you could help me with that matter. Thank you for understanding! Teşekkür ederim!
Hello, dear developers!
My name is Mukhammadsaid; I am a third-year Computer Science B.Sc. student at Inha University in Tashkent. This current issue bears little relevance to this repository, but I have to ask you for help.
As you know, Uzbek is a resource-poor language, although it is the second most spoken Turkic language after Turkish (approximately 34-35 mln speakers). That fact prompted me to create at least something for Uzbek, so I chose spell-checker. Later I discovered that Hunspell and other technologies might not work for Uzbek and the right way was to develop it using FSTs. So, I have started collecting all morphotactic rules from various sources and made MVP of morph analyzer UzMorph (in foma). I hired two linguists who annotated the lexicon for the analyzer. Following Oflazer, I found many similarities between Turkish and Uzbek, although Uzbek is detached from other Turkic languages due to the Persian influence. I can send you the paper draft that explains the morph analyzer if you want.
Then, I made a website and mobile keyboard Tahrirchi (Uzbek, "editor") for Uzbek that fully works on FSTs.
However, for further research and development the project needs funding. The current morph analyzer recognizes around 97-98% of words, so there's a room for improvement. Also, if we could make a treebank using the morph analyzer, we could create many other tools for Uzbek, such as a tagger.
The thing is, it seems implausible that I can get any funding from the government. I have tried all ways of proposing the project to the government, but my efforts bore no fruit. I wonder if there is a tiny chance that Google Research might be interested in the Uzbek language. I had to write issue here since I couldn't find the correct email address that considers such mails from Google Research. I would be more than grateful if you could help me with that matter. Thank you for understanding! Teşekkür ederim!