Closed TejMakode1523 closed 2 months ago
Hi @TejMakode1523
Thanks for reaching out and describing your issue in detail. Yes, we normalize the Indic numerals to English numerals during the pre-processing of the input texts to the model and just operate with English numerals. I would like to highlight that this behavior is by design for outputs from the IndicTrans2 models.
You can easily modify this behavior during the post-processing of the translation outputs. A simple approach would be to create a dictionary that maps English numerals to their respective Indic numerals and you can easily transform English numerals to Indic numerals by string manipulation operations.
I hope this helps you.
Thank you for your prompt response and clarification regarding the behavior of the IndicTrans2 model with numerals. I appreciate your explanation that the normalization of Indic numerals to English numerals during pre-processing is intentional.
Your suggestion to handle numeral translation during post-processing using a dictionary mapping English numerals to Indic numerals sounds like a practical solution. I will implement this approach and test its effectiveness in transforming numeral outputs as needed.
You can find the mapping which we had used to normalize Indic numerals to English numerals here.
You will need to invert this mapping, divide it into language- or script-specific mappings, and use the appropriate one based on the target language / script during postprocessing.
The IndicTrans2 model does not correctly translate numerals from one Indian language to another. When translating text that includes numerals, the numerals remain in the source language rather than being translated into the target language's numeral system.
Steps to Reproduce
Expected Behavior
Numerals should be translated into the target language's numeral system. For example, in the case of Hindi to Marathi translation:
Actual Behavior
The numerals remain in the source language format (123) instead of being translated to the target language format (१२३).
Environment
Additional Context
This issue affects the readability and correctness of translations in documents where numerals play a significant role, such as legal, educational, and technical documents.
Suggested Solution
Implement a numeral translation mapping within the model to handle the conversion of numerals from the source language to the target language's numeral system.
Thank you for your attention to this issue. Please let me know if any additional information or examples are needed.
Best regards, [Tejas Makode]