Incorporating Underrepresented Languages: A Focus on Low-Resource Languages

In this step, we address the challenge of incorporating underrepresented languages with a focus on low-resource languages. This effort confronts the prevalent imbalance in NLP systems, which are predominantly oriented towards high-resource languages such as English, Chinese, and Spanish. These languages benefit from extensive digital resources, including large text corpora, facilitating their dominance in NLP research. Conversely, low-resource languages like Lao and Sanskrit are characterized by a scarcity of digital resources. Our aim is to highlight these underrepresented languages (Lao and Sanskrit as the candidates from this group), recognizing and exploring their unique linguistic features. By integrating these languages, we strive to develop truly language-agnostic system and embrace the full spectrum of global linguistic diversity.

fani-lab / LADy

Incorporating Underrepresented Languages: A Focus on Low-Resource Languages #70