[FEAT]: Integration of Vllm as model server

flefevre commented 4 months ago

What would you like to see?

it would be great to be able to configure AnythingLLM with a Vllm model https://github.com/vllm-project/vllm

mkhludnev commented 4 months ago

I were able to use vllm selecting Local AI in AnythingLLM LLM Settings. Enjoy.

flefevre commented 4 months ago

Thanks for your advice.

I have tested but i failed. I do confirm vllm instance is working fine since https://vllm-mixtral.myserver.fr/v1/models

{"object":"list","data":[{"id":"mistralai/Mixtral-8x7B-Instruct-v0.1","object":"model","created":1714112327,"owned_by":"vllm","root":"mistralai/Mixtral-8x7B-Instruct-v0.1","parent":null,"permission":[{"id":"modelperm-76d249bf4f0e44698e3bb82a41424183","object":"model_permission","created":1714112327,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

i have putted in the config: LocalAI with http://vllm-mistral:5002/v1 Anything is able to retrieve the model.

But when I tried to engage chat, i got an error:vCould not respond to message.

Request failed with status code 400

Looking to the log of Anythingllm , I ahev the following trace:

I would appreciate your help. Thanks in advance. Francois, from France


      _events: [Object: null prototype],

349
      _eventsCount: 1,

350
      _maxListeners: undefined,

351
      socket: [Socket],

352
      httpVersionMajor: 1,

353
      httpVersionMinor: 1,

354
      httpVersion: '1.1',

355
      complete: true,

356
      rawHeaders: [Array],

357
      rawTrailers: [],

358
      joinDuplicateHeaders: undefined,

359
      aborted: false,

360
      upgrade: false,

361
      url: '',

362
      method: null,

363
      statusCode: 400,

364
      statusMessage: 'Bad Request',

365
      client: [Socket],

366
      _consuming: false,

367
      _dumped: false,

368
      req: [ClientRequest],

369
      responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions',

370
      redirects: [],

371
      [Symbol(kCapture)]: false,

372
      [Symbol(kHeaders)]: [Object],

373
      [Symbol(kHeadersCount)]: 10,

374
      [Symbol(kTrailers)]: null,

375
      [Symbol(kTrailersCount)]: 0

376
    }

377
  },

378
  isAxiosError: true,

379
  toJSON: [Function: toJSON]

380
}

On Wed, Apr 24, 2024 at 2:52 PM Mikhail Khludnev @.***> wrote:

I were able to use vllm selecting Local AI in AnythingLLM LLM Settings. Enjoy.

— Reply to this email directly, view it on GitHub https://github.com/Mintplex-Labs/anything-llm/issues/1153#issuecomment-2074877514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZRFGNVGLCW5HBLY7LT2DY66TI7AVCNFSM6AAAAABGQZ5LSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZUHA3TONJRGQ . You are receiving this because you authored the thread.Message ID: @.***>

--

flefevre commented 4 months ago

If it could help you in your analysis: it seems to ask for https://vllm-mixtral.myserver.fr/v1/chat/completions

but when i go this url i got: {"detail":"Method Not Allowed"}

it is a problem of api between vllm and anythingllm ?

timothycarambat commented 4 months ago

That endpoint is POST only, not GET - which is part of the reason you got method not allowed when going to the URL directly.

alceausu commented 4 months ago

I have the same issue, no mather the integration (Local AI or Generic OpenAI). The vllm server replies with: ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/... INFO:` "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

flefevre commented 4 months ago

Dear all, I have tested again and no way to use vllm directly even if I have exposed the port through the docker configuration. Could you share exactly how did you do it? Thanks again

mkhludnev commented 4 months ago

@flefevre once again, I did it twice it works. Maybe it's a container connectivity issue? I remember that I use curl to check connectivity between containers. Can you tell how your containers,hosts and processes are aligned? The simplest approach for me were launch vllm and AnythingLLM as a sibling containers under the single docker-compose.yml config and point anithingllm to vllm via container name. Another approach was to run vllm in a host, then launch AnythingLLM via docker and point to vllm via host.docker.internal but iirc is a Docker Desktop only feature.

alceausu commented 4 months ago

It seems these are two different issues, one related to connectivity, and the other one on format. Related to the request format, the anything-llm can reach vllm, but vllm throws an error of '400 Bad request'. ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/... For some hints on why, see vllm discussion Mixtral instruct doesn't accept system prompt Is there a way to modify the template on the anything-llm side?

flefevre commented 4 months ago

Dear all, Normally i have simplified the test.

Docker configuration

Anythingllm a docker compose with the same network
Vllm a docker compose with the same network

Docker validation

When I connect to the anythingllm container, i am able to retrieve model of vllm through the command:

anythingllm@6de6c5255f33:~$ curl http://vllm-mixtral:5002/v1/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "prompt": "San Francisco is a", "max_tokens": 7,"temperature": 0}' {"id":"cmpl-0df1e0e95b4c46a78632936ba277e3ef","object":"text_completion","created":1714551853,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"text":" city that is known for its steep","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}}anythingllm@6de6c5255f33:~$

Anythingllm Webui configuration I am able to configure the default LLM preference by setting

Local AI
http://vllm-mixtral:5002 >> it see the model and propose me mistralai/Mixtral-8x7B-Instruct-v0.1

Anythingllm Webui Test When i create a workspace, open a new thread, and ask something, I got everytime a Could not respond to message. Request failed with status code 400

When i look at the Anythingllm logs, i have the following trace

` httpVersionMajor: 1, httpVersionMinor: 1, httpVersion: '1.1', complete: true, rawHeaders: [Array], rawTrailers: [], joinDuplicateHeaders: undefined, aborted: false, upgrade: false, url: '', method: null, statusCode: 400, statusMessage: 'Bad Request', client: [Socket], _consuming: false, _dumped: false, req: [ClientRequest], responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions', redirects: [],

  [Symbol(kHeaders)]: [Object],
  [Symbol(kHeadersCount)]: 10,
  [Symbol(kTrailers)]: null,
  [Symbol(kTrailersCount)]: 0
}

}, isAxiosError: true, toJSON: [Function: toJSON]`

Analysis I agree with @alceausu , the problem seems not to come from a mis configuration of docker / vllm / anythingllm. It seems more related to a misconfiguration between anlythingllm and vllm in the usage of a specific model which in my case mixtral8x7b. A solution could be to be able to understand the specificity of each vllm/model system prompt, as proposed by @alceausu https://github.com/vllm-project/vllm/discussions/2112 Or perhaps to use a model proxy such as Litellm that will encapsulate the model interaction based on an uniform api which is inspired by openai.

I have created the following Feature proposal here #1154 , I do think it is the good solution. Do you agree?

If yes, my ticket should perhaps invalidated since Anythingllm is compatible with vllm but not with all models served by vllm. Mixtral8x7b is really a good model. It will be perfect to access to it through a proxy such as Litellm, ensuring the developer of Anythingllm do not have to adapt all their backend solution for prompting for each model.

Thanks for for your expertise.

Mintplex-Labs / anything-llm

[FEAT]: Integration of Vllm as model server #1153

What would you like to see?