基于TGI的模型server无法在v100上进行推理加速

WisdomShell / codeshell

A series of code large language models developed by PKU-KCL

http://se.pku.edu.cn/kcl

Other

1.62k stars 120 forks source link

Open BruceMa29 opened 1 year ago

BruceMa29 commented 1 year ago

@ruixie 您好，由于v100无法使用FlashAttention，所以是不是也没办法在v100上对模型服务进行推理加速呢？

ruixie commented 1 year ago

你好，TGI的推理加速很大部分是依赖FlashAttention的，所以在V100上加速效果确实会受到影响

BruceMa29 commented 1 year ago

@ruixie 好的，因为FlashAttention在v100上跑不起来，所以那也就没法进行加速了。原生transformers压测的吞吐量很低