OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.29k stars 288 forks source link

DeciLM-6b support #1499

Closed NeonBohdan closed 10 months ago

NeonBohdan commented 1 year ago

There is an interesting model released Deci/DeciLM-6b It has llama architecture with Grouped-Query Attention (GQA) like llama-70B

Is it possible to add support for this model in ctranslate2? And will it be much faster than llama-7B as it's stated for GQA(maybe for ctransalte2 things will be a little bit different)?

vince62s commented 10 months ago

we can't support thousands of variations, just adapt the loaders.