Open farinamhz opened 5 months ago
For the backtranslation phase in our experiments with these languages, we employ nllb
. The parameters for specifying the languages will be lao_Laoo
for Lao
and san_Deva
for Sanskrit
. The outcomes of these experiments will be integrated into LADy version 0.2.0.0, which already contains results from the nllb
translator.
In this step, we address the challenge of incorporating underrepresented languages with a focus on low-resource languages. This effort confronts the prevalent imbalance in NLP systems, which are predominantly oriented towards high-resource languages such as
English
,Chinese
, andSpanish
. These languages benefit from extensive digital resources, including large text corpora, facilitating their dominance in NLP research. Conversely, low-resource languages likeLao
andSanskrit
are characterized by a scarcity of digital resources. Our aim is to highlight these underrepresented languages (Lao
andSanskrit
as the candidates from this group), recognizing and exploring their unique linguistic features. By integrating these languages, we strive to develop truly language-agnostic system and embrace the full spectrum of global linguistic diversity.