biolab / orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3
Other
127 stars 82 forks source link

text pre-processing with CAMeL #781

Open FD81 opened 2 years ago

FD81 commented 2 years ago

What's your use case? I am interested in better pre-processing of Arabic texts, which I am guessing could be of interest also for other potential users.

Currently the best option to pre-process arabic is UDPipe but this is old and not very reliable in my view. CAMeL instead is a very recent development in Arabic NLP and seems to work well. Other Arabic pre-processing options are low-quality

I am interested in applying NLP to Arabic texts and CAMeL would be very helpful.

What's your proposed solution?

Add the option to pre-process texts with CAMeL Tools, you can find out more about it here: https://github.com/CAMeL-Lab/camel_tools **Are there any alternative solutions?** No, this is the best Arabic NLP tool currently available that I am aware of. Thanks for your kind attention, I hope I am submitting this suggestion in the right place.
ajdapretnar commented 2 years ago

This is a great suggestion, but the problem with this is that it introduces yet another dependency. This would soon lead to having a dependency for each language (Chinese, Arabic, Japanese, Hindu, etc.), which is a nightmare to maintain. Hence we rely on a single, albeit old UDPipe dependency, which provides access to several languages. If there's a library that includes more than a single language, we would be happy to look into it.