Caucasus-Rosetta / Lingua-Corpus

Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
Apache License 2.0
33 stars 6 forks source link

Multilingual dictionary parsing #9

Closed Bachstelze closed 3 years ago

Bachstelze commented 4 years ago

It would be a good start for other low-resource languages to parse translation dictionaries. If there are only dictionary images, then we should start with the support of OCR. Which translation dictionaries could be used?

danielinux7 commented 4 years ago

The languages that are from the same family,: east and west Adyghe languages, Abaza language and Ubykh language. I'll do some research.

danielinux7 commented 4 years ago

I think in agile terms, this is more like a story, it is a continuous effort, so maybe it should be moved to the back log, coined as a user story, and create tasks that is related to it, I added 3 dictionaries that needs OCR in the draft folder, also resources in the readme file.

Bachstelze commented 4 years ago

The development of OCR tools is a complete own project, so we should translate the parsed ab-ru dictionary with glosbe. We have the original dictionary as validation.