Open flefevre opened 4 months ago
I were able to use vllm selecting Local AI in AnythingLLM LLM Settings. Enjoy.
Thanks for your advice.
I have tested but i failed. I do confirm vllm instance is working fine since https://vllm-mixtral.myserver.fr/v1/models
{"object":"list","data":[{"id":"mistralai/Mixtral-8x7B-Instruct-v0.1","object":"model","created":1714112327,"owned_by":"vllm","root":"mistralai/Mixtral-8x7B-Instruct-v0.1","parent":null,"permission":[{"id":"modelperm-76d249bf4f0e44698e3bb82a41424183","object":"model_permission","created":1714112327,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
i have putted in the config: LocalAI with http://vllm-mistral:5002/v1 Anything is able to retrieve the model.
But when I tried to engage chat, i got an error:vCould not respond to message.
Request failed with status code 400
Looking to the log of Anythingllm , I ahev the following trace:
I would appreciate your help. Thanks in advance. Francois, from France
_events: [Object: null prototype],
349
_eventsCount: 1,
350
_maxListeners: undefined,
351
socket: [Socket],
352
httpVersionMajor: 1,
353
httpVersionMinor: 1,
354
httpVersion: '1.1',
355
complete: true,
356
rawHeaders: [Array],
357
rawTrailers: [],
358
joinDuplicateHeaders: undefined,
359
aborted: false,
360
upgrade: false,
361
url: '',
362
method: null,
363
statusCode: 400,
364
statusMessage: 'Bad Request',
365
client: [Socket],
366
_consuming: false,
367
_dumped: false,
368
req: [ClientRequest],
369
responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions',
370
redirects: [],
371
[Symbol(kCapture)]: false,
372
[Symbol(kHeaders)]: [Object],
373
[Symbol(kHeadersCount)]: 10,
374
[Symbol(kTrailers)]: null,
375
[Symbol(kTrailersCount)]: 0
376
}
377
},
378
isAxiosError: true,
379
toJSON: [Function: toJSON]
380
}
On Wed, Apr 24, 2024 at 2:52 PM Mikhail Khludnev @.***> wrote:
I were able to use vllm selecting Local AI in AnythingLLM LLM Settings. Enjoy.
— Reply to this email directly, view it on GitHub https://github.com/Mintplex-Labs/anything-llm/issues/1153#issuecomment-2074877514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZRFGNVGLCW5HBLY7LT2DY66TI7AVCNFSM6AAAAABGQZ5LSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZUHA3TONJRGQ . You are receiving this because you authored the thread.Message ID: @.***>
--
If it could help you in your analysis: it seems to ask for https://vllm-mixtral.myserver.fr/v1/chat/completions
but when i go this url i got:
{"detail":"Method Not Allowed"}
it is a problem of api between vllm and anythingllm ?
That endpoint is POST only, not GET - which is part of the reason you got method not allowed when going to the URL directly.
I have the same issue, no mather the integration (Local AI or Generic OpenAI).
The vllm server replies with:
ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/...
INFO:` "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Dear all, I have tested again and no way to use vllm directly even if I have exposed the port through the docker configuration. Could you share exactly how did you do it? Thanks again
@flefevre once again, I did it twice it works. Maybe it's a container connectivity issue? I remember that I use curl to check connectivity between containers. Can you tell how your containers,hosts and processes are aligned?
The simplest approach for me were launch vllm and AnythingLLM as a sibling containers under the single docker-compose.yml config and point anithingllm to vllm via container name.
Another approach was to run vllm in a host, then launch AnythingLLM via docker and point to vllm via host.docker.internal
but iirc is a Docker Desktop only feature.
It seems these are two different issues, one related to connectivity, and the other one on format.
Related to the request format, the anything-llm can reach vllm, but vllm throws an error of '400 Bad request'.
ERROR` serving_chat.py:60] Error in applying chat template from request: Conversation roles must alternate user/assistant/user/assistant/...
For some hints on why, see vllm discussion Mixtral instruct doesn't accept system prompt
Is there a way to modify the template on the anything-llm side?
Dear all, Normally i have simplified the test.
Docker configuration
Docker validation
When I connect to the anythingllm container, i am able to retrieve model of vllm through the command:
anythingllm@6de6c5255f33:~$ curl http://vllm-mixtral:5002/v1/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "prompt": "San Francisco is a", "max_tokens": 7,"temperature": 0}' {"id":"cmpl-0df1e0e95b4c46a78632936ba277e3ef","object":"text_completion","created":1714551853,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"text":" city that is known for its steep","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}}anythingllm@6de6c5255f33:~$
Anythingllm Webui configuration I am able to configure the default LLM preference by setting
Anythingllm Webui Test
When i create a workspace, open a new thread, and ask something, I got everytime a
Could not respond to message. Request failed with status code 400
When i look at the Anythingllm logs, i have the following trace
` httpVersionMajor: 1, httpVersionMinor: 1, httpVersion: '1.1', complete: true, rawHeaders: [Array], rawTrailers: [], joinDuplicateHeaders: undefined, aborted: false, upgrade: false, url: '', method: null, statusCode: 400, statusMessage: 'Bad Request', client: [Socket], _consuming: false, _dumped: false, req: [ClientRequest], responseUrl: 'http://vllm-mixtral:5002/v1/chat/completions', redirects: [],
[Symbol(kHeaders)]: [Object],
[Symbol(kHeadersCount)]: 10,
[Symbol(kTrailers)]: null,
[Symbol(kTrailersCount)]: 0
}
}, isAxiosError: true, toJSON: [Function: toJSON]`
Analysis I agree with @alceausu , the problem seems not to come from a mis configuration of docker / vllm / anythingllm. It seems more related to a misconfiguration between anlythingllm and vllm in the usage of a specific model which in my case mixtral8x7b. A solution could be to be able to understand the specificity of each vllm/model system prompt, as proposed by @alceausu https://github.com/vllm-project/vllm/discussions/2112 Or perhaps to use a model proxy such as Litellm that will encapsulate the model interaction based on an uniform api which is inspired by openai.
I have created the following Feature proposal here #1154 , I do think it is the good solution. Do you agree?
If yes, my ticket should perhaps invalidated since Anythingllm is compatible with vllm but not with all models served by vllm. Mixtral8x7b is really a good model. It will be perfect to access to it through a proxy such as Litellm, ensuring the developer of Anythingllm do not have to adapt all their backend solution for prompting for each model.
Thanks for for your expertise.
What would you like to see?
it would be great to be able to configure AnythingLLM with a Vllm model https://github.com/vllm-project/vllm