Closed kurianbenoy closed 7 months ago
The IndicTrans2 model is trained on a general domain corpus (BPCC), which might potentially lack adequate representation of such abbreviations. You can consider fine-tuning to improve performance on such cases.
One additional point to note is that sentence segmentation tools may inadvertently fragment sentences at periods within these abbreviations, thereby leading to incomplete sentence being passed to the model, consequently yielding suboptimal translations.
Regarding (2), this might be primarily due to biases arising from the training data, which may be a bit hard to directly control. Probably fine-tuning the model may help.
Thank you @PranjalChitale for suggesting what next to do. Is there any updates planned to IndicTrans2 models anytime soon?
Hi @kurianbenoy
We do not plan to update IndicTrans2 models anytime soon. Thanks!
Input Text
Output Text: