alibaba / higress

Cloud Native API Gateway | 云原生API网关
https://higress.io
Apache License 2.0
2.55k stars 423 forks source link

ai-token-ratelimit 插件 stream 模式下 token 计算有误 #1080

Closed cr7258 closed 1 week ago

cr7258 commented 1 week ago

If you are reporting any crash or any potential security issue, do not open an issue in this repo. Please report the issue via ASRC(Alibaba Security Response Center) where the issue will be triaged appropriately.

Ⅰ. Issue Description

ai-token-ratelimit 插件限制每分钟消耗的总 token 数为 200。

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-token-ratelimit
  namespace: higress-system
spec:
  defaultConfig:
    rule_name: default_limit_by_param_apikey
    rule_items:
    - limit_by_param: apikey
      limit_keys:
      - key: 123456
        token_per_minute: 200
    redis:
      service_name: redis.dns
      service_port: 6379
  url: oci://ghcr.io/cr7258/wasm-go-ai-token-ratelimit:v1.0.47
  phase: UNSPECIFIED_PHASE
  priority: 600

请求总共消耗了 46 个 token(13 input, 33 output)

curl "http://qwen-test.com:18000/v1/chat/completions?apikey=123456" -H "Content-Type: application/json"  -d '{
  "model": "gpt-3",
  "messages": [
    {
      "role": "user",
      "content": "你好,你是谁?"
    }
  ],
  "stream": true
}'
data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"我是"}}],"created":1719910877,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"通"}}],"created":1719910877,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"义"}}],"created":1719910877,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"千问,由阿里"}}],"created":1719910877,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"云开发的AI助手。我可以回答"}}],"created":1719910878,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"各种问题、提供信息和与用户"}}],"created":1719910878,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"进行对话。有什么我可以帮助你的吗"}}],"created":1719910878,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"delta":{"role":"assistant","content":"?"}}],"created":1719910878,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[{"index":0,"finish_reason":"stop"}],"created":1719910878,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{}}

data:{"id":"37b2a612-3004-9edf-b670-502b32602fac","choices":[],"created":1719910878,"model":"qwen-turbo","object":"chat.completion.chunk","usage":{"prompt_tokens":13,"completion_tokens":33,"total_tokens":46}}

ai-statistics 插件的统计结果是准确的。

istio-proxy@higress-gateway-56c8bd59d5-n4hv6:/$ curl -s 127.0.0.1:15020/stats/prometheus |grep qwen |grep token
route_upstream_model_input_token{ai_route="qwen",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 13
route_upstream_model_output_token{ai_route="qwen",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 33

然而在 ai-token-ratelimit 插件在 Redis 中减去的 token 数是实际的 2 倍。(200 - 108 = 92)

127.0.0.1:6379> GET higress-token-ratelimit:default_limit_by_param_apikey:limit_by_param:apikey:123456
"108"

Ⅱ. Describe what happened

If there is an exception, please attach the exception trace:

Just paste your stack trace here!

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis
        ports:
        - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: redis
spec:
  ports:
  - port: 6379
    targetPort: 6379
  selector:
    app: redis
---
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-proxy
  namespace: higress-system
spec:
  matchRules:
  - config:
      provider:
        type: qwen
        apiTokens:
        - "<your-ai-token>"
        modelMapping:
          'gpt-3': "qwen-turbo"
          'gpt-35-turbo': "qwen-plus"
          'gpt-4-turbo': "qwen-max"
          '*': "qwen-turbo"
    ingress:
    - qwen
  url: oci://ghcr.io/cr7258/wasm-go-ai-proxy:v1.0.46
  phase: UNSPECIFIED_PHASE
  priority: 100
---
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-statistics
  namespace: higress-system
spec:
  defaultConfig:
    enable: true
  url: oci://ghcr.io/cr7258/wasm-go-ai-token-statistics:v1.0.47
  phase: UNSPECIFIED_PHASE
  priority: 200
---
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-token-ratelimit
  namespace: higress-system
spec:
  defaultConfig:
    rule_name: default_limit_by_param_apikey
    rule_items:
    - limit_by_param: apikey
      limit_keys:
      - key: 123456
        token_per_minute: 200
    redis:
      service_name: redis.dns
      service_port: 6379
  url: oci://ghcr.io/cr7258/wasm-go-ai-token-ratelimit:v1.0.47
  phase: UNSPECIFIED_PHASE
  priority: 600
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    higress.io/backend-protocol: HTTPS
    higress.io/destination: qwen.dns
    higress.io/proxy-ssl-name: dashscope.aliyuncs.com
    higress.io/proxy-ssl-server-name: "on"
  labels:
    higress.io/resource-definer: higress
  name: qwen
  namespace: higress-system
spec:
  ingressClassName: higress
  rules:
  - host: qwen-test.com
    http:
      paths:
      - backend:
          resource:
            apiGroup: networking.higress.io
            kind: McpBridge
            name: default
        path: /
        pathType: Prefix
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    higress.io/destination: redis.dns
    higress.io/ignore-path-case: "false"
  labels:
    higress.io/resource-definer: higress
  name: redis
spec:
  ingressClassName: higress
  rules:
  - http:
      paths:
      - backend:
          resource:
            apiGroup: networking.higress.io
            kind: McpBridge
            name: default
        path: /
        pathType: Prefix
---
apiVersion: networking.higress.io/v1
kind: McpBridge
metadata:
  name: default
  namespace: higress-system
spec:
  registries:
  - domain: dashscope.aliyuncs.com
    name: qwen
    port: 443
    type: dns
  - domain: redis.default.svc.cluster.local 
    name: redis
    type: dns
    port: 6379

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

johnlanni commented 1 week ago

cc @rinfx

cr7258 commented 1 week ago

找到原因了,原因是编译的镜像中把在 ai-token-ratelimit 插件中的这段代码删除了:https://github.com/alibaba/higress/pull/1060#discussion_r1659828690

导致 stream 结束之前会拿到前一次 ai-statistics 的计数,stream 结束之后又去拿了一次,因此重复扣除 token 了。