I was trying to run the exl2 quants for these models , but getting error at rotatry embedding these models use two rope scaling factors as long_factor and short_factor.
Model is good and the vllm , huggingface have a merge which does support this but they don't support exl2.
The closest model Aphrodite already supports.
No response
What's your difficulty of supporting the model you want?
The model to consider.
https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
I was trying to run the exl2 quants for these models , but getting error at rotatry embedding these models use two rope scaling factors as long_factor and short_factor. Model is good and the vllm , huggingface have a merge which does support this but they don't support exl2.
The closest model Aphrodite already supports.
No response
What's your difficulty of supporting the model you want?
relevant git merges :
https://github.com/vllm-project/vllm/pull/4298