RiadKatby / arabic-stemmer

Implementation of Shereen Khoja algorithm of Arabic Words stemming implemented in Microsoft C# and dotnet 6
MIT License
6 stars 1 forks source link

strange letter j appear in the result #1

Open aalgrou opened 1 week ago

aalgrou commented 1 week ago

When processing Arabic text, such as the word "التعاون", unexpected English character 'j' appear in the Arabic results. For example, instead of "عون", the result shows "عjون".

aalgrou commented 1 week ago

I fix it in the PR, please chech

RiadKatby commented 1 week ago

Thank you Abdullah for your contribution, however this is how the algorithm is designed, "j" letter is used to locate vowel letter 'حروف العلة' in the resulted stem.

because as you know it some time it is coverted from alef to wow, and other times from alef to yaa. and off course in the example that you wrote alef is covected to wow, and that's why 'j' letter is directly after 'ع' letter.

so if that hint is valuable for you you can search for the index of the 'j' and if not you can just replace 'j' with ''

aalgrou commented 1 week ago

@RiadKatby please check the code, I made some enhancement, and I returned back the logic of j as before. and I used your hint. please check the PR