danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
19.29k stars 3.22k forks source link

Enhancement: Add Cloudflare Workers AI #2317

Open lludlow opened 7 months ago

lludlow commented 7 months ago

What features would you like to see added?

Add support for chat models, but also embedding, image-generating, tts, etc. via plugin for Cloudflare Workers AI

More details

Cloudflare provides various AI models as well as generous free allocation.

Which components are impacted by your request?

Endpoints

Pictures

No response

Code of Conduct

whalygood commented 7 months ago

It is already possible to integrate Cloudflare Workers AI with LibreChat by configuring a custom AI provider through the LiteLLM proxy. This translates the OpenAI style calls into something the Workers AI API can understand.

https://autoize.com/llama-3-on-cloudflare-workers-ai-ai-at-the-edge/

I even have it functioning with the latest model, Llama 3 8B Instruct.

HundSimon commented 5 months ago

Cloudflare worker AI has official OpenAI compatible API support. However, I always encounter the problem The response is incomplete; it's either still processing, was cancelled, or censored. Refresh or try a different prompt. I dont's know what's wrong. Here's my librechat.yaml config

- name: 'WorkerAI'
  apiKey: '${WORKERAI_KEY}'
  baseURL: 'https://api.cloudflare.com/client/v4/accounts/{my account id}/ai/v1'
  models:
    default: [
      "@cf/qwen/qwen1.5-14b-chat-awq",
      "@hf/meta-llama/meta-llama-3-8b-instruct",
      ]
    fetch: false
  titleConvo: true
  titleModel: 'cf/qwen/qwen1.5-14b-chat-awq'
  # Recommended: Drop the stop parameter from the request as Openrouter models use a variety of stop tokens.
  dropParams: ['stop']
  modelDisplayLabel: 'Worker AI'
kneelesh48 commented 4 months ago

Can someone please add workers ai support in librechat.yml

marcozac commented 3 months ago

Hi, I'm using Workers AI without any issues!

This is my librechat.yaml:

endpoints:
  custom:
    - name: 'WorkersAI'
      apiKey: '${WORKERS_AI_API_KEY}'
      baseURL: 'https://api.cloudflare.com/client/v4/accounts/<account_id>/ai/v1'
      models:
        default: ['@cf/meta/llama-3.1-8b-instruct']
        fetch: true
      titleConvo: true
      titleModel: '@cf/meta/llama-3.1-8b-instruct'
      summarize: false
      summaryModel: '@cf/facebook/bart-large-cnn'
      forcePrompt: false
      modelDisplayLabel: 'Workers AI'

I've also written a fetcher for the models, which you can find here: https://github.com/marcozac/LibreChat/tree/workers-ai.

It seems to work and passes the tests, but I believe it still needs some additional testing and some implementations, such as support for max_input_tokens, max_total_tokens, etc.

Also, I haven't figured out yet why the first message needs to be stopped manually, while all the others work fine without any problems.

librechat-workersai-prompt

marcozac commented 3 months ago

After some testing, I encountered several errors due to incompatibilities between the format of the received chunks and the format accepted by OpenAI, such as the lack of the role, as well as messages with empty bodies that cause errors on both sides.

So, I wrote (with a bit of copying from here and there) a chatCompletion handler that directly routes requests to the model endpoints. More or less the same structure can be reused for other types of requests as well.

It seems to work both when calling Cloudflare's APIs directly and with the AI Gateway, but I still need to write formal tests. I plan to do that in the next few days.

If anyone has any suggestions, they are definitely welcome! Once it’s more or less stable, I might open a PR.

danny-avila commented 3 months ago

If anyone has any suggestions, they are definitely welcome! Once it’s more or less stable, I might open a PR.

Thanks so much @marcozac