alibaba / higress

🤖 AI Gateway | AI Native API Gateway
https://higress.io
Apache License 2.0
3.22k stars 509 forks source link

HEP: Supports LLM API proxy and management #738

Open johnlanni opened 10 months ago

johnlanni commented 10 months ago

Why do you need it?

Who will use:

Developers who start businesses based on the LLM model (通义千问/openai/gemini).

目标用户:

希望基于LLM模型创造商业价值的开发者。


What problem to solve

  1. Developers cannot completely rely on LLM SaaS API to develop applications. It is too expensive and requires a lot of NLP by themselves.
  2. Many LLM SaaS APIs are not stable and require fault-tolerance mechanisms such as retry and failover.
  3. Developers hope to further provide their capabilities based on LLM API to the outside world and realize revenue (API monetization).
  4. Developers hope to allow multiple users to share one API Key of the LLM SaaS service and implement independent control such as rate limiting, ACL and cost observation.

解决什么问题

  1. 开发者不能完全依赖LLM SaaS API来开发应用程序,按token计费,开销比较高,需要自己做提前做一些NLP处理。
  2. 很多LLM API不稳定,需要重试、故障转移等容错机制。
  3. 开发者希望进一步向外界提供基于LLM API的能力,实现收益(API货币化)。
  4. 开发者希望允许多个用户共享LLM SaaS服务的一个API Key,并实现速率限制、ACL、成本观察等独立控制。

Why use Higress to achieve this

  1. Based on the extension mechanism of wasm, the above requirements can be realized.
  2. Regardless of API additions, deletions, or plugin logic changes, it will not affect the disconnection of long connections, and is friendly to APIs with stream features (SSE) enabled.
  3. Using API gateway to manage LLM API can reuse many API management capabilities.

使用Higress有何优势

  1. 基于wasm的扩展机制,可以实现上述需求。
  2. 无论API增删,或者插件逻辑变化,都不会影响长连接的断开,对开启流特性(SSE)的API友好。
  3. 使用API网关来管理LLM API可以复用很多API管理能力。
  4. 可以实现HTTPS证书自动续签

How could it be?

  1. Implement the LLMProxy plugin (multi-model protocol unification, retry/failover to improve stability, open source/commercial model switching to reduce costs, filtering and blocking of sensitive words)
  2. Consumer model abstraction (reusing existing authentication plugins), unified API Key authentication, and shielding the call authentication mechanisms of different underlying models
  3. Consumer granular token/cost observation analysis

  1. 实现LLMProxy插件(多模型协议统一、重试/故障转移提高稳定性、开源/商业模型切换降低成本、敏感词过滤屏蔽)
  2. 消费者模型抽象(复用现有认证插件),统一API Key认证,屏蔽不同底层模型的调用认证机制
  3. 消费者粒度的token/成本观察分析

Other related information

We already has a simple chatgpt-proxy plugin, and there is an article introducing it. We can continue to expand capabilities based on this work.

我们已经有了一个简单的 chatgpt-proxy 插件,并且有一篇介绍它的 文章 我们可以在这项工作的基础上继续扩展能力以实现这个功能。

sjcsjc123 commented 10 months ago

我可以尝试一下:

预计调用的接口有通义千问 httpgemini go sdk

johnlanni commented 10 months ago

@sjcsjc123 👍感谢,这是个HEP issue,会进一步拆出子任务提供认领

dspo commented 10 months ago

需要自己提前做一些 NLP 处理

具体是哪些处理 ?

很多LLM API不稳定,需要重试、故障转移等容错机制。

重试比较好理解,故障转移怎么实现 ?

开发者希望进一步向外界提供基于LLM API的能力,实现收益(API货币化)。

提供给开发者的能力应该是应用层的能力,而不是分发 API 吧,不然消费者为什么中间商呢 ?

无论API增删,或者插件逻辑变化,都不会影响长连接的断开,对开启流特性(SSE)的API友好。

现在的插件都要完整读 Response Body, 似乎做不到支持 SSE

实现LLMProxy插件(多模型协议统一、重试/故障转移提高稳定性、开源/商业模型切换降低成本、敏感词过滤屏蔽)

多模型统一,是指统一对外暴露一种社区普遍认可的协议(如 Openai v1),然后翻译成 LLM 供应商的 API 吗?还是不翻译协议,Openai 协议就只能调用 Openai,Azure 就只能调用 Azure ?