The translation result from English to Korean using the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model does not make sense at all
from transformers import MarianMTModel, MarianTokenizer
src_text = [
"2, 4, 6 etc. are even numbers.",
"Yes."
]
tokenizer = MarianTokenizer.from_pretrained(MODEL_PATH3)
model = MarianMTModel.from_pretrained(MODEL_PATH3)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
for t in translated:
print( tokenizer.decode(t, skip_special_tokens=True) )
The result is not ['2, 4, 6 등은 짝수입니다.', '그래'] as in the example, but ['그들은,우리는,우리는 모자입니다. 신뢰할 수 있습니다.', 'ATP입니다.'] which does not make sense at all.
I tried some more sentences and believe that correct tokenizer or vocab file can correct this problem.
Could you take a look at it?
The translation result from English to Korean using the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model does not make sense at all
The result is not ['2, 4, 6 등은 짝수입니다.', '그래'] as in the example, but ['그들은,우리는,우리는 모자입니다. 신뢰할 수 있습니다.', 'ATP입니다.'] which does not make sense at all.
I tried some more sentences and believe that correct tokenizer or vocab file can correct this problem. Could you take a look at it?