Open pepesi opened 5 days ago
cc @rinfx
@pepesi 是不是百川和智谱没有返回usage,有看过他们的api能否支持么,可以支持的话,最好扩展下ai proxy里的相关逻辑
@pepesi 是不是百川和智谱没有返回usage,有看过他们的api能否支持么,可以支持的话,最好扩展下ai proxy里的相关逻辑
是返回了usage的,只是它在最后的一个是 Done,不是一个chunk对象。我目前只测试了baichuan 和zhipuai,其他的渠道还未测试过。
baichuan
data: {"id":"chatcmpl-M6404016RK8MoIC","object":"chat.completion.chunk","created":1719490102,"model":"Baichuan4","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}
data:
data: {"id":"chatcmpl-M6404016RK8MoIC","object":"chat.completion.chunk","created":1719490102,"model":"Baichuan4","choices":[{"index":0,"delta":{"role":"assistant","content":"! How can I"}}]}
data:
data: {"id":"chatcmpl-M6404016RK8MoIC","object":"chat.completion.chunk","created":1719490103,"model":"Baichuan4","choices":[{"index":0,"delta":{"role":"assistant","content":" assist you today?"}}]}
data:
data: {"id":"chatcmpl-M6404016RK8MoIC","object":"chat.completion.chunk","created":1719490103,"model":"Baichuan4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}],"usage":{"prompt_tokens":3,"completion_tokens":10,"total_tokens":13}}
data:
data: [DONE]
data:
zhipuai
event:add
id:lang-to-lang-v4-1719490222731-141670
data:Hello! How can I assist you today? If you have any questions or need advice on a topic, feel free to ask.
event:finish
id:lang-to-lang-v4-1719490222731-141670
data:{"choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":29,"completion_tokens":28,"total_tokens":57},"request_id":null,"task_status":null,"created":1719490224,"model":"glm-4-0520","id":"8786554896418608324","error":null}
在我最新的测试中,发现ai-token-ratelimit 插件似乎依赖了 ai-statistics 注入 filter_state的数据,需要沟通确认下是否是这么设计的。
cc @cr7258
@johnlanni ai-statistics 无法正常计数的原因和 @pepesi 说的一致,是由于最后一条消息是 [Done],而不是一个 chunk 对象导致的。 ai-token-ratelimit 依赖了 ai-statistics 注入的 input_token 和 output_token 来进行限流。 已经 review 并验证过 PR 中的代码,可以成功修复 ai-statistics 的 token 计数问题,并且简化了为 ai-token-ratelimit 设置 input_token 和 output_token 相关的重复代码。
ai-statistics 插件正常工作
istio-proxy@higress-gateway-659965d767-tnpwv:/$ curl -s http://localhost:15090/stats/prometheus |grep token | grep -E "baichuan|qwen"
# TYPE route_baichuan_upstream_outbound_443__baichuan_dns_model_Baichuan4_input_token counter
route_baichuan_upstream_outbound_443__baichuan_dns_model_Baichuan4_input_token{} 24
# TYPE route_baichuan_upstream_outbound_443__baichuan_dns_model_Baichuan4_output_token counter
route_baichuan_upstream_outbound_443__baichuan_dns_model_Baichuan4_output_token{} 110
# TYPE route_qwen_upstream_outbound_443__qwen_dns_model_qwen_turbo_input_token counter
route_qwen_upstream_outbound_443__qwen_dns_model_qwen_turbo_input_token{} 13
# TYPE route_qwen_upstream_outbound_443__qwen_dns_model_qwen_turbo_output_token counter
route_qwen_upstream_outbound_443__qwen_dns_model_qwen_turbo_output_token{} 33
ai-token-ratelimit 插件正常工作
kubectl port-forward -n higress-system svc/higress-gateway 18000:80
# 本地设置 /etc/host
curl "http://baichuan-test.com:18000/v1/chat/completions?apikey=777777" -H "Content-Type: application/json" -d '{
"model": "Baichuan4",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "你好,你是谁?"
}
],
"stream": true
}' -i
HTTP/1.1 429 Too Many Requests
x-ratelimit-reset: 35
content-length: 17
content-type: text/plain
date: Sat, 29 Jun 2024 14:00:25 GMT
server: istio-envoy
Too many requests%
完整配置。
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: ai-proxy
namespace: higress-system
spec:
matchRules:
- config:
provider:
type: qwen
apiTokens:
- "<api-token>"
modelMapping:
'gpt-3': "qwen-turbo"
'gpt-35-turbo': "qwen-plus"
'gpt-4-turbo': "qwen-max"
'*': "qwen-turbo"
ingress:
- qwen
- config:
provider:
type: baichuan
apiTokens:
- "<api-token>"
ingress:
- baichuan
url: oci://ghcr.io/cr7258/wasm-go-ai-proxy:v1.0.46
phase: UNSPECIFIED_PHASE
priority: 100
---
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: ai-statistics
namespace: higress-system
spec:
defaultConfig:
enable: true
url: oci://ghcr.io/cr7258/wasm-go-ai-token-statistics:v1.0.47
phase: UNSPECIFIED_PHASE
priority: 200
---
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: ai-token-ratelimit
namespace: higress-system
spec:
defaultConfig:
rule_name: default_limit_by_param_apikey
rule_items:
- limit_by_param: apikey
limit_keys:
- key: 777777
token_per_minute: 5
redis:
service_name: redis.static
service_port: 6379
url: oci://ghcr.io/cr7258/wasm-go-ai-token-ratelimit:v1.0.47
phase: UNSPECIFIED_PHASE
priority: 600
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
higress.io/backend-protocol: HTTPS
higress.io/destination: qwen.dns
higress.io/proxy-ssl-name: dashscope.aliyuncs.com
higress.io/proxy-ssl-server-name: "on"
labels:
higress.io/resource-definer: higress
name: qwen
namespace: higress-system
spec:
ingressClassName: higress
rules:
- host: qwen-test.com
http:
paths:
- backend:
resource:
apiGroup: networking.higress.io
kind: McpBridge
name: default
path: /
pathType: Prefix
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
higress.io/backend-protocol: HTTPS
higress.io/destination: baichuan.dns
higress.io/proxy-ssl-name: api.baichuan-ai.com
higress.io/proxy-ssl-server-name: "on"
labels:
higress.io/resource-definer: higress
name: baichuan
namespace: higress-system
spec:
ingressClassName: higress
rules:
- host: baichuan-test.com
http:
paths:
- backend:
resource:
apiGroup: networking.higress.io
kind: McpBridge
name: default
path: /
pathType: Prefix
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
higress.io/destination: redis.static
higress.io/ignore-path-case: "false"
labels:
higress.io/resource-definer: higress
name: redis.static
spec:
ingressClassName: higress
rules:
- http:
paths:
- backend:
resource:
apiGroup: networking.higress.io
kind: McpBridge
name: default
path: /
pathType: Prefix
---
apiVersion: networking.higress.io/v1
kind: McpBridge
metadata:
name: default
namespace: higress-system
spec:
registries:
- domain: dashscope.aliyuncs.com
name: qwen
port: 443
type: dns
- domain: api.baichuan-ai.com
name: baichuan
port: 443
type: dns
- domain: 192.168.2.150:6379 # 本地起的 redis 服务
name: redis
type: static
port: 6379
related issue: https://github.com/alibaba/higress/issues/1057