OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.16k stars 277 forks source link

Anomalous T5 results using GPU inference on a 4090 graphics card #1679

Open taishan1994 opened 4 months ago

taishan1994 commented 4 months ago

Thank you very much for your work. I'm using ctranslate2 accelerated https://huggingface.co/Maciel/T5Corrector-base-v2 reasoning and when using the cpu for inference the output is normal, but switching to using the GPU, the output is all: Response Text: {" translated_text":"..."} , where is the problem please?

BBC-Esq commented 3 months ago

Wish I could help but it's all in Chinese...what exactly are you trying to do?

taishan1994 commented 3 months ago

This is the code I tested.

ct2-transformers-converter --model T5Corrector-base-v2 --output_dir T5Corrector-base-v2-ct2  --force --quantization float16

import ctranslate2
# translator = ctranslate2.Translator("T5Corrector-base-v2-ct2", device="cpu")
translator = ctranslate2.Translator("T5Corrector-base-v2-ct2", device="cuda",  device_index=0)
input_text=""
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))
results = translator.translate_batch([input_tokens])

output_tokens = results[0].hypotheses[0]
output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
BBC-Esq commented 3 months ago

Sorry, thought I might help but not familiar with that model.