There is an interesting model released Deci/DeciLM-6b
It has llama architecture with Grouped-Query Attention (GQA) like llama-70B
Is it possible to add support for this model in ctranslate2?
And will it be much faster than llama-7B as it's stated for GQA(maybe for ctransalte2 things will be a little bit different)?
There is an interesting model released Deci/DeciLM-6b It has llama architecture with Grouped-Query Attention (GQA) like llama-70B
Is it possible to add support for this model in ctranslate2? And will it be much faster than llama-7B as it's stated for GQA(maybe for ctransalte2 things will be a little bit different)?