OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.24k stars 287 forks source link

Difference translation result after convert to ctranslate #1781

Open hieunguyenquoc opened 1 week ago

hieunguyenquoc commented 1 week ago

Hi. I have finetune the Helsinki-NLP/opus-mt-zh-vi model for translating Chinese to Vietnamese. When I convert the model to ctranslate2, the performance is decrease (from 32 sacrebleu with transformer inference to just 28 sacrebleu with ctranslate2 inference). Can anyone explain for me why ? Thank you. Here is my code :

Converted code : ct2-transformers-converter --model /home/hieunq/Documents/VTP/chinsese_translation/train_chinese_vietnamsese_translation/finetune_helsinky_zh_vi/model/checkpoint-36625 --output_dir zh-vi-ct2 --force --copy_files generation_config.json tokenizer_config.json vocab.json source.spm target.spm

Inference code :
import ctranslate2 import transformers import time import torch import evaluate

metric = evaluate.load("sacrebleu")

device = "cuda" if torch.cuda.is_available() else "cpu" translator = ctranslate2.Translator("/home/hieunq/Documents/VTP/chinsese_translation/train_chinese_vietnamsese_translation/finetune_helsinky_zh_vi/zh-vi-ct2", device=device, compute_type="auto") tokenizer = transformers.AutoTokenizer.from_pretrained("/home/hieunq/Documents/VTP/chinsese_translation/train_chinese_vietnamsese_translation/finetune_helsinky_zh_vi/zh-vi-ct2")

f_zh = open("/home/hieunq/Documents/VTP/chinsese_translation/data_version_1_and_2/zh/test_zh/test_zh_version_2_data_Thời_trang_nữ.txt","r") f_vi = open("/home/hieunq/Documents/VTP/chinsese_translation/data_version_1_and_2/vi/test_vi/test_vi_version_2_data_Thời_trang_nữ.txt","r",encoding="utf-8") texts = f_zh.readlines()

translated_texts = [] start = time.time() batch_source_tokens = [tokenizer.convert_ids_to_tokens(tokenizer.encode(sentence)) for sentence in texts]

batch_size = 10 results = translator.translate_batch(batch_source_tokens, max_batch_size = batch_size, beam_size = 4)

for i, result in enumerate(results): target = result.hypotheses[0] # Giả sử chúng ta lấy hypothesis tốt nhất translated_sentence = tokenizer.decode(tokenizer.convert_tokens_to_ids(target)) translated_texts.append(translated_sentence)

references = f_vi.readlines()

predictions_texts = [pred.strip() for pred in translated_texts] references_text = [pred.strip() for pred in references]

result = metric.compute(predictions=predictions_texts, references=references_text) print(result["score"]) print("Time :", time.time() - start)

minhthuc2502 commented 1 week ago

Different frameworks may have slightly varied implementations of backend operations, so small differences in scores are expected. You might also want to test with CTranslate2 3.x to see if it brings any improvements.

hieunguyenquoc commented 6 days ago

@minhthuc2502 Hi. Thanks for your response. I have tried your suggestion. But it still have the same result. It there anyway so I can remain the quality of the ctranlate2 model ?