Closed cnmoro closed 1 week ago
cohere models are already supported, what error message are you getting?
cohere models are already supported, what error message are you getting?
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/endpoints/openai/api_server.py", line 562, in <module>
run_server(args)
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
engine = AsyncAphrodite.from_engine_args(engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/engine/async_aphrodite.py", line 340, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/engine/args_tools.py", line 539, in create_engine_config
model_config = ModelConfig(
^^^^^^^^^^^^
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/common/config.py", line 137, in __init__
self.hf_config = get_config(self.model, trust_remote_code, revision,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/transformers_utils/config.py", line 107, in get_config
return extract_gguf_config(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/moro/miniconda3/envs/aphrodite/lib/python3.11/site-packages/aphrodite/transformers_utils/config.py", line 48, in extract_gguf_config
raise RuntimeError(f"Unsupported architecture {architecture}, "
RuntimeError: Unsupported architecture command-r, only llama is supported.
First, it is suggested to use exl2 over gguf. If you need to use gguf models for anything other than llama, you need to convert them first https://github.com/PygmalionAI/aphrodite-engine/wiki/8.-Quantization#pre-convert-to-pytorch-state_dict-recommanded
First, it is suggested to use exl2 over gguf. If you need to use gguf models for anything other than llama, you need to convert them first https://github.com/PygmalionAI/aphrodite-engine/wiki/8.-Quantization#pre-convert-to-pytorch-state_dict-recommanded
I see, I have already tried this model in exl2 format on the exllama2 engine, but it outputs incoherent text 50% of the time. But in gguf in ollama it works flawlessly. That's why I was trying it on aphrodite
This should work perfectly fine as of v0.6.0. Feel free to re-open the issue if the problem persists.
🚀 The feature, motivation and pitch
The CohereForAI/aya-23-8B model is a new model and has very competitive performance. It currently is not supported because the model is of type "CohereForCausalLM", the same of command-r.
Alternatives
No response
Additional context
No response