Closed arky closed 2 years ago
Hi @arky
Thanks for bringing this issue to our attention. We'll look into adding Khmer into our supported languages.
Kind regards
@Rosencrantz Let me know if I can pitch in and help. I would like to help out with building better ingest for Southern and South Eastern languages.
Hi @arky, we would appreciate the help for sure. We have some documentation on how to add a new language to Aleph at https://docs.alephdata.org/developers/technical-faq#how-do-i-add-support-for-a-new-language-to-aleph. The second part of the section describes how to add a new language for the ingestion pipeline.
If you could make a PR to add Khmer language support, we would be happy to merge that in. And we would be happy to help you along the process.
Thank you @sunu I have added support for ingestion of Khmer documents. Unfortunately there isn't a spacy model for Khmer language yet.
Thanks a lot @arky! I'll make sure we merge your PRs in next week before the next Aleph release.
OCR support for Khmer language is now available in Aleph 3.12.0
It is not possible to ingest Khmer language documents for OCR as the language is available in the investigation.