clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.76k stars 1.58k forks source link

Arabic language support #231

Open adhaamehab opened 6 years ago

adhaamehab commented 6 years ago

Hi all,

I want to work on arabic language support. Any advice on how I should start ?

ghost commented 6 years ago

A good start is to first know the Arabic alphabet, and use an Arabic keyboard. You should be able to use an Arabic keyboard on almost every operating system, including iOS. Then, you should learn parts-of-speech and qualities like gender. Practicing grammar will also help. Then, you could start researching vocabulary from WordReference. The parser code for Arabic should not be too different from other languages, unlike languages like Swiss German. Also, be aware that letters can change depending on if they are at the beginning, middle, or end of a word, or isolated.

adhaamehab commented 6 years ago

@MohamedAlFahim Thanks for your response. I'm fluent arabic speaker. My question is more about language extension addition process

ghost commented 6 years ago

In that case, you should start by creating an “ar” folder under pattern/text, then fill it with files like ar-verbs.txt (verb conjugation), ar-context.txt (sentence structure), ar-lexicon.txt (word list), and inflect.py (for modifying words). Again, Arabic shares certain structures from other languages, so it is possible to copy-paste some existing structures (like from Pattern's English).