AI4Bharat / IndicTrans2

Translation models for 22 scheduled languages of India
https://ai4bharat.iitm.ac.in/indic-trans2
MIT License
214 stars 59 forks source link

use with ctranslate #65

Closed pr509 closed 4 months ago

pr509 commented 5 months ago

when i am trying to convert trained fairseq model to ctranslate i got a source_vocabolary.json and target_vocabalory.json out of these two the target_vocab.json in full with unicodes and because of this the translation is not coming good .could you plz help me in finding solution for this

PranjalChitale commented 5 months ago

Post porting a fairseq model to ctranslate2, using the ct2-fairseq-converter the source and target vocabularies are dumped in txt format and not in json format, so this issue would never arise.

Can you share more details as to what was the command used and double-check what was the model being used ?