HEP: Supports LLM API proxy and management

johnlanni commented 10 months ago

Why do you need it?

Who will use:

Developers who start businesses based on the LLM model (通义千问/openai/gemini).

～

目标用户:

希望基于LLM模型创造商业价值的开发者。

What problem to solve

Developers cannot completely rely on LLM SaaS API to develop applications. It is too expensive and requires a lot of NLP by themselves.
Many LLM SaaS APIs are not stable and require fault-tolerance mechanisms such as retry and failover.
Developers hope to further provide their capabilities based on LLM API to the outside world and realize revenue (API monetization).
Developers hope to allow multiple users to share one API Key of the LLM SaaS service and implement independent control such as rate limiting, ACL and cost observation.

～

解决什么问题

Why use Higress to achieve this

Based on the extension mechanism of wasm, the above requirements can be realized.
Regardless of API additions, deletions, or plugin logic changes, it will not affect the disconnection of long connections, and is friendly to APIs with stream features (SSE) enabled.
Using API gateway to manage LLM API can reuse many API management capabilities.

～

使用Higress有何优势

Implement the LLMProxy plugin (multi-model protocol unification, retry/failover to improve stability, open source/commercial model switching to reduce costs, filtering and blocking of sensitive words)
Consumer model abstraction (reusing existing authentication plugins), unified API Key authentication, and shielding the call authentication mechanisms of different underlying models
Consumer granular token/cost observation analysis

～

We already has a simple chatgpt-proxy plugin, and there is an article introducing it. We can continue to expand capabilities based on this work.

～

我们已经有了一个简单的 chatgpt-proxy 插件，并且有一篇介绍它的文章我们可以在这项工作的基础上继续扩展能力以实现这个功能。

sjcsjc123 commented 10 months ago

我可以尝试一下：

johnlanni commented 10 months ago

@sjcsjc123 👍感谢，这是个HEP issue，会进一步拆出子任务提供认领

dspo commented 10 months ago

需要自己提前做一些 NLP 处理

具体是哪些处理 ?

很多LLM API不稳定，需要重试、故障转移等容错机制。

重试比较好理解，故障转移怎么实现？

开发者希望进一步向外界提供基于LLM API的能力，实现收益（API货币化）。

提供给开发者的能力应该是应用层的能力，而不是分发 API 吧，不然消费者为什么中间商呢？

无论API增删，或者插件逻辑变化，都不会影响长连接的断开，对开启流特性（SSE）的API友好。

现在的插件都要完整读 Response Body, 似乎做不到支持 SSE

实现LLMProxy插件（多模型协议统一、重试/故障转移提高稳定性、开源/商业模型切换降低成本、敏感词过滤屏蔽）

多模型统一，是指统一对外暴露一种社区普遍认可的协议（如 Openai v1），然后翻译成 LLM 供应商的 API 吗？还是不翻译协议，Openai 协议就只能调用 Openai，Azure 就只能调用 Azure ？