PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
926 stars 100 forks source link

[Feature]: Add Support for aya-23-8b with GGUF #504

Closed cnmoro closed 1 week ago

cnmoro commented 3 months ago

🚀 The feature, motivation and pitch

The CohereForAI/aya-23-8B model is a new model and has very competitive performance. It currently is not supported because the model is of type "CohereForCausalLM", the same of command-r.

Alternatives

No response

Additional context

No response

sgsdxzy commented 3 months ago

cohere models are already supported, what error message are you getting?

cnmoro commented 3 months ago

cohere models are already supported, what error message are you getting?

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/endpoints/openai/api_server.py", line 562, in <module>
    run_server(args)
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
    engine = AsyncAphrodite.from_engine_args(engine_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/engine/async_aphrodite.py", line 340, in from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/engine/args_tools.py", line 539, in create_engine_config
    model_config = ModelConfig(
                   ^^^^^^^^^^^^
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/common/config.py", line 137, in __init__
    self.hf_config = get_config(self.model, trust_remote_code, revision,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/transformers_utils/config.py", line 107, in get_config
    return extract_gguf_config(model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/transformers_utils/config.py", line 48, in extract_gguf_config
    raise RuntimeError(f"Unsupported architecture {architecture}, "
RuntimeError: Unsupported architecture command-r, only llama is supported.
sgsdxzy commented 3 months ago

First, it is suggested to use exl2 over gguf. If you need to use gguf models for anything other than llama, you need to convert them first https://github.com/PygmalionAI/aphrodite-engine/wiki/8.-Quantization#pre-convert-to-pytorch-state_dict-recommanded

cnmoro commented 3 months ago

First, it is suggested to use exl2 over gguf. If you need to use gguf models for anything other than llama, you need to convert them first https://github.com/PygmalionAI/aphrodite-engine/wiki/8.-Quantization#pre-convert-to-pytorch-state_dict-recommanded

I see, I have already tried this model in exl2 format on the exllama2 engine, but it outputs incoherent text 50% of the time. But in gguf in ollama it works flawlessly. That's why I was trying it on aphrodite

AlpinDale commented 1 week ago

This should work perfectly fine as of v0.6.0. Feel free to re-open the issue if the problem persists.