Open labKnowledge opened 1 month ago
Hi @labKnowledge!
First, Rundi is actually present in the NLLB-200 model. Please take a look at the NLLB paper, where run_Latn
is included in the language list on page 15. Also you could try our translation interface co-branded with UNESCO and Huggingface, https://huggingface.co/spaces/UNESCO/nllb, where translation for Rundi is supported.
Second, currently, there are no specific plans of releasing a new version of NLLB. However, there are specific steps you could undertake to improve the translation for Kirundi in the future releases of other multilingual translation models.
🚀 Feature Request
Include Kirundi (Rundi) language in the No Language Left Behind (NLLB) project
Motivation
The lack of Kirundi in NLLB creates several critical issues:
Limited access to information: Kirundi speakers struggle to access vital information in their native language, especially in areas like health, education, and technology.
Hindered software development: Local developers face significant challenges in creating applications and services tailored to Kirundi speakers, limiting innovation and economic growth in our region.
Digital divide: The absence of Kirundi in major language models like NLLB widens the digital divide, leaving our community behind in the rapidly advancing world of AI and natural language processing.
Cultural preservation: Without proper representation in language models, there's a risk of Kirundi losing its digital presence, potentially impacting its long-term preservation and evolution.
Pitch
We propose the inclusion of Kirundi in the NLLB project. This addition would:
Enable accurate translation to and from Kirundi, facilitating better communication and information exchange for millions of speakers.
Empower local developers to create more sophisticated, language-specific applications and services.
Enhance natural language understanding capabilities for Kirundi, opening doors for advanced AI applications in areas such as voice recognition, text-to-speech, and sentiment analysis.
Contribute to the digital preservation of Kirundi, ensuring its relevance in the digital age.
Align with NLLB's mission of language inclusivity and bridging the gap for underrepresented languages.
Alternatives
While alternatives are limited, some developers have attempted to:
Use closely related languages like Kinyarwanda as a proxy, but this leads to inaccuracies and doesn't fully capture the nuances of Kirundi.
Develop smaller, less efficient language models specifically for Kirundi, but these lack the resources and scale of NLLB.
Rely on human translation, which is time-consuming, expensive, and not scalable for large-scale applications.
These alternatives are insufficient and emphasize the need for Kirundi's inclusion in a comprehensive project like NLLB.
Additional context
We have a growing tech community eager to leverage advanced language models. The inclusion of Kirundi in NLLB would catalyze numerous projects and potentially transform our digital landscape.
Furthermore, Burundi's unique linguistic situation, with Kirundi as the primary language alongside French and English, presents an interesting case study for multilingual societies and could provide valuable data for improving NLLB's capabilities in similar contexts.
We are ready and willing to collaborate in any way possible to facilitate this inclusion, including providing language data, expert knowledge, and testing support.