CAMeL-Lab / camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
MIT License
403 stars 71 forks source link

[QUESTION] Disambiguation using unfactored bert model does not yield same results as using the Camelira Web Interface #130

Open amsu2 opened 9 months ago

amsu2 commented 9 months ago

I installed the project. Did all things.

Used the example code from https://camel-tools.readthedocs.io/en/stable/api/disambig/bert.html#examples.

Tried out various input sentences. In pretty much every sentence, often in verbs, the last letter remains without diacritization.

But more importantly, every so often, a word gets disambiguated completely different to what the Camlira Website would do, and the weightings are also different.

Example: Input: وهي مدرسة Output: وَهِيَ مَدْرَسَةٌ Camelira Website Output: وَهِيَ مُدَرِّسَةٌ

For some words, not only are the weightings or the chosing between two 1.0 results different, but the analysis is completely different.

Example: Input: مهمة Output: مَهَمَّةً Camelia Website Output: 30 versions of مُهِمَّةٌ; mahammah is not once included.

Thanks in advance for your help. I'm a CS Student and have been interested in linguistics and Arabic for a few years now; I'm a big fan of your work. This would really help me.

Windows 10, Python 3.9

Hamed1Hamed commented 7 months ago

I have the same issue!