OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.03k stars 269 forks source link

Inference for ctranslate2 using tensor parallel with mpi #1724

Open mohith56 opened 3 weeks ago

mohith56 commented 3 weeks ago

import ctranslate2,psutil,os,transformers,time,torch

generator = ctranslate2.Generator("/ct2opt-1.3b",tensor_parallel=True,device="cuda") tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-1.3b")

def generate_text(text): for prompt in text: start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt)) results = generator.generate_batch([start_tokens], max_length=30,include_prompt_in_result=False) output = tokenizer.decode(results[0].sequences_ids[0]) return output

text = ["Hello, I am"] results=generate_text(text) print(results)

i am getting 4 outputs when i run this script using this command: mpirun -np 4 python3 ffctranslateload.py and the results are very bad. although the model is distributed.when i keep -np 1 the results are good. how to get good results when i keep -np 4

minhthuc2502 commented 3 weeks ago

Which version ct2 you used? Try CT2 with version 4.2.1 or 4.3.1.