Open lludlow opened 7 months ago
It is already possible to integrate Cloudflare Workers AI with LibreChat by configuring a custom AI provider through the LiteLLM proxy. This translates the OpenAI style calls into something the Workers AI API can understand.
https://autoize.com/llama-3-on-cloudflare-workers-ai-ai-at-the-edge/
I even have it functioning with the latest model, Llama 3 8B Instruct.
Cloudflare worker AI has official OpenAI compatible API support.
However, I always encounter the problem
The response is incomplete; it's either still processing, was cancelled, or censored. Refresh or try a different prompt.
I dont's know what's wrong.
Here's my librechat.yaml
config
- name: 'WorkerAI'
apiKey: '${WORKERAI_KEY}'
baseURL: 'https://api.cloudflare.com/client/v4/accounts/{my account id}/ai/v1'
models:
default: [
"@cf/qwen/qwen1.5-14b-chat-awq",
"@hf/meta-llama/meta-llama-3-8b-instruct",
]
fetch: false
titleConvo: true
titleModel: 'cf/qwen/qwen1.5-14b-chat-awq'
# Recommended: Drop the stop parameter from the request as Openrouter models use a variety of stop tokens.
dropParams: ['stop']
modelDisplayLabel: 'Worker AI'
Can someone please add workers ai support in librechat.yml
Hi, I'm using Workers AI without any issues!
This is my librechat.yaml
:
endpoints:
custom:
- name: 'WorkersAI'
apiKey: '${WORKERS_AI_API_KEY}'
baseURL: 'https://api.cloudflare.com/client/v4/accounts/<account_id>/ai/v1'
models:
default: ['@cf/meta/llama-3.1-8b-instruct']
fetch: true
titleConvo: true
titleModel: '@cf/meta/llama-3.1-8b-instruct'
summarize: false
summaryModel: '@cf/facebook/bart-large-cnn'
forcePrompt: false
modelDisplayLabel: 'Workers AI'
I've also written a fetcher for the models, which you can find here: https://github.com/marcozac/LibreChat/tree/workers-ai.
It seems to work and passes the tests, but I believe it still needs some additional testing and some implementations, such as support for max_input_tokens, max_total_tokens, etc.
Also, I haven't figured out yet why the first message needs to be stopped manually, while all the others work fine without any problems.
After some testing, I encountered several errors due to incompatibilities between the format of the received chunks and the format accepted by OpenAI, such as the lack of the role
, as well as messages with empty bodies that cause errors on both sides.
So, I wrote (with a bit of copying from here and there) a chatCompletion
handler that directly routes requests to the model endpoints. More or less the same structure can be reused for other types of requests as well.
It seems to work both when calling Cloudflare's APIs directly and with the AI Gateway, but I still need to write formal tests. I plan to do that in the next few days.
If anyone has any suggestions, they are definitely welcome! Once it’s more or less stable, I might open a PR.
If anyone has any suggestions, they are definitely welcome! Once it’s more or less stable, I might open a PR.
Thanks so much @marcozac
What features would you like to see added?
Add support for chat models, but also embedding, image-generating, tts, etc. via plugin for Cloudflare Workers AI
More details
Cloudflare provides various AI models as well as generous free allocation.
Which components are impacted by your request?
Endpoints
Pictures
No response
Code of Conduct