c0sogi / llama-api

An OpenAI-like LLaMA inference API
MIT License
111 stars 9 forks source link

Support for ExLlama V2 #15

Closed Immortalin closed 11 months ago

Immortalin commented 1 year ago

https://github.com/turboderp/exllamav2

c0sogi commented 1 year ago

Exllama v2 seems to be working now. Would you like to test this out? Simply add version=2 to ExllamaModel as below:

your_gptq_model = ExllamaModel(
    version=2,
    model_path="TheBloke/MythoMax-L2-13B-GPTQ",  # automatic download
    max_total_tokens=4096,
)
ehartford commented 10 months ago

Thank you!