Tongjilibo / bert4torch

An elegent pytorch implement of transformers
https://bert4torch.readthedocs.io/
MIT License
1.2k stars 152 forks source link

LLM:chatGLM2推理加速 #156

Open Lxhnnn opened 10 months ago

Lxhnnn commented 10 months ago

怎么提高GLM2模型的推理速度

Tongjilibo commented 10 months ago

glm2有使用flash_attention和multihead_attention,继续加速可以考虑一些加速框架吧