Tlntin / Qwen-TensorRT-LLM

MIT License
569 stars 52 forks source link

API for multi-GPU inference #106

Open UIHCRITT opened 5 months ago

UIHCRITT commented 5 months ago

when i use api method to inference ,I find error when i use multi-GPU, I also find that api.py is import run_old.py, it seems that it not can be use multi-GPU?

Tlntin commented 5 months ago

yeal, api.py not support multi gpu. why not use tritonserver to deploy it.

Tlntin commented 5 months ago

there is a sample code used api.py with multi gpu. link

UIHCRITT commented 5 months ago

I think this code from is not complete; I mean that he said "我成功了,不过有点麻烦,需要在每一个进程都调用一次模型,不然会陷入假死状态。",but I can't find this work in he code. Did I misunderstand?

Tlntin commented 5 months ago

Yes, he should omit the code already in the api.py file.