bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.7k stars 616 forks source link

feat: c4ai-command-r-v01 support #944

Closed 0x77dev closed 3 months ago

0x77dev commented 5 months ago

Feature request

Would be nice to have ability to run Command-R (CohereForAI/c4ai-command-r-v01) using OpenLLM

Motivation

No response

Other

vLLM backend already supports Command-R in v0.4.0: https://github.com/vllm-project/vllm/issues/3330#issuecomment-2041225404

0x77dev commented 5 months ago

The current ghcr.io/bentoml/openllm:latest image (sha256:1860863091163a8e8cb1225c99d6e1b0735c11871e14e8d8424a22a5ad6742fa) shows an error:

ValueError: The checkpoint you are trying to load has a model type of `cohere`, which Transformers does not recognize. This may be due to a problem with the checkpoint or an outdated version of Transformers.

when doing this:

docker run --rm --gpus all -p 3000:3000 -it ghcr.io/bentoml/openllm start CohereForAI/c4ai-command-r-v01 --backend vllm

also when installing openllm[vllm] it brings 0.2.7 version of vLLM

Though vLLM version in main branch is 0.4.0: https://github.com/bentoml/OpenLLM/blob/main/openllm-core/pyproject.toml#L83 and https://github.com/bentoml/OpenLLM/blob/main/tools/dependencies.py#L157

GaetanBaert commented 5 months ago

I think this should be the same prompting system, there is also CohereForAI/c4ai-command-r-plus available and it would be nice to be able to run it too.

aarnphm commented 3 months ago

should be supported on main now. Will release a new version soon.