RiadKatby / arabic-stemmer

Implementation of Shereen Khoja algorithm of Arabic Words stemming implemented in Microsoft C# and dotnet 6
MIT License
6 stars 2 forks source link
arabic-nlp artificial-intelligence csharp dotnet6 machine-learning natural-language-processing

Arabic stemmer

This is dotnet and C# implementation of Shereen Khoja alogrithm of Stemming Arabic Words based on common patterns in Arabic Language. I used this implementation in a set of Machine Learning and Natural Language Processing Projects that built base on Microsoft Technologies so it would be helpfull to make it avaiable for Microsoft Communitites

Please following this link to find the original implementation and documentation of Shereen Khoja original words

What is stemming? Stemming is the process of removing any affixes from words, and reducing these words to their roots. For example, stemming the English word computing produces the root comput. This is the same root produced by the word computation.

What is stemming useful for? After reducing words to their roots, these roots can be used in compression, spell checking, text searching, and text analysis.

Compression To reduce the size of documents, large words could be stored in their root form. A small program would then be used to return the document to its original form when opened. It would do this by using context and grammar to determine the original form of the word.

Spell checking Instead of searching for a complete word in a dictionary, only the root would be searched for. This reduces the size of the dictionary.

Text searching The best example of this is web search engines. Searching for the root of a word gives a wider search than trying to find an exact match.

Text Analysis For example in statistical text analysis, stemming helps in mapping grammatical variations of a word to instances of the same term.