Open FD81 opened 2 years ago
This is a great suggestion, but the problem with this is that it introduces yet another dependency. This would soon lead to having a dependency for each language (Chinese, Arabic, Japanese, Hindu, etc.), which is a nightmare to maintain. Hence we rely on a single, albeit old UDPipe dependency, which provides access to several languages. If there's a library that includes more than a single language, we would be happy to look into it.
What's your use case? I am interested in better pre-processing of Arabic texts, which I am guessing could be of interest also for other potential users.
Currently the best option to pre-process arabic is UDPipe but this is old and not very reliable in my view. CAMeL instead is a very recent development in Arabic NLP and seems to work well. Other Arabic pre-processing options are low-qualityI am interested in applying NLP to Arabic texts and CAMeL would be very helpful.
What's your proposed solution?
Add the option to pre-process texts with CAMeL Tools, you can find out more about it here: https://github.com/CAMeL-Lab/camel_tools **Are there any alternative solutions?** No, this is the best Arabic NLP tool currently available that I am aware of. Thanks for your kind attention, I hope I am submitting this suggestion in the right place.