deepflowio / deepflow

eBPF Observability - Distributed Tracing and Profiling
https://deepflow.io
Apache License 2.0
2.99k stars 334 forks source link

[FR] Enhance self-learning collection of http2/gRPC header key values #8242

Open Fancyki1 opened 1 month ago

Fancyki1 commented 1 month ago

Search before asking

Description

需求: deepflow v6.4版本实现了eBPF kprobe 高性能解码 HTTP2 压缩头,自动学习通信双方的压缩字典,但是在实际过程中采集自定义header存在丢失乱序覆盖的问题,希望使用只采集value去解决自定义头匹配的问题 文章来源:https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/ 缺陷:

  1. 对于 deepflow-agent 启动之前就已经存在的 HTTP2 长连接,已存在的动态字典表项无法解码
  2. 使用 cBPF 时,由于网络中可能存在丢包、重传、乱序等因素,因此对压缩头不的还原可能存在误差(但 eBPF kprobe 无此限制)
  3. 实际测试v6.5版本可能存在压缩字典乱序的问题,导致采集内容key和value对应不上

问题描述: 对于可能存在压缩字典乱序的问题,导致采集内容key和value对应不上,实测效果

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
       -  field-name: "x-custom-msg"
       -  field-name: "x-custom-data"

发送一个http2/gRPC的请求

:authority: www.xxxx.com
:method: POST
:path: /list?aid=6383&sdk_version=5.1.18_zip&device_platform=web&zip=1
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br, zstd
accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
content-encoding: gzip
content-length: 5368
content-type: application/json; charset=utf-8
origin: https://www.xxxx.com
priority: u=1, i
referer: https://www.xxxx.com/
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36 Edg/129.0.0.0
x-custom-code: 200
x-custom-msg: success
x-custom-data: {"test": "data"}

技术原理:https://kiosk007.top/post/http-2-0-header-compression/ http2索引表包括:静态表rfc7541和动态表 image

Server代码落库位置:

deepflow\server\ingester\flow_log\log_data\l7_flow_log.go


// AttributeNames = [] 数组 和 AttributeValues = [] 数组
// 映射关系是一对一 key=>value关系:AttributeNames[i]=>AttributeValues[i]
h.AttributeNames = append(h.AttributeNames, l.ExtInfo.AttributeNames...)
h.AttributeValues = append(h.AttributeValues, l.ExtInfo.AttributeValues...)
h.MetricsNames = append(h.MetricsNames, l.ExtInfo.MetricsNames...)
h.MetricsValues = append(h.MetricsValues, l.ExtInfo.MetricsValues...)
落库结果举例:

情况1:正常,少数

AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

情况2:异常,大量

x-custome-msg 被 x-custome-code 覆盖,索引表解析乱序

AttributeNames = ["rpc_services","x-custom-code","x-custom-code","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

x-custome-code 被 x-custome-data 覆盖,索引表解析乱序

AttributeNames = ["rpc_services","x-custom-data","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

**技术方案:**
技术思路:既然自学习HTTP2头解析索引表还是存在一些不足,不如从有特点的value入手通过配置进行补全

首先来一个通用简单的场景,分隔符处理,定义一个header

定义的header key :x-custom-content,没有实际意义,如果wireshark和deepflow学习不到这个值的时候是unknown

特定字符串分隔符:!#!

x-custom-content: "200!#!success!#!{\"test\": \"data\"}"

实际协议解析可能为:unknown:"200!#!success!#!{\"test\": \"data\"}"

增加一个配置:这里有几个不同的方案,经过实测后

static_config: l7-protocol-advanced-features: extra-log-fields: http2:

由于特殊分隔符的情况较少,解析header时候可以被特殊分隔符分割且分割后的长度大于等于2的value,按照匹配规则和预定义的key进行补全。

补全后的结果和正常自学习header结果一致,效果稳定

AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

场景补充:正则匹配处理(字段冗余思路)

# 定义的header key :x-custom-content,http2协议标准,动态表的一个字段,解析没有实际意义
# 特定字符串分隔符:!#!
x-custom-code: "x-custom-code:200"
x-custom-msg: "x-custom-msg:success"
x-custom-data: "x-custom-data:{\"test\": \"data\"}"
# 实际协议解析可能为:
# unknown: "x-custom-code:200"
# unknown: "x-custom-msg:success"
# unknown: "x-custom-data:{\"test\": \"data\"}"

增加一个配置

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
          match-value-rule: "^x-custom-code:(.*)"
          field-value-index: 0
       -  field-name: "x-custom-msg"
          match-value-rule: "^x-custom-msg:(.*)"
          field-value-index: 0
       -  field-name: "x-custom-data"
          match-value-rule: "^x-custom-data:(.*)"
          field-value-index: 0

举例伪代码处理:

import re

input_string = "x-custom-msg:success"
pattern = r"^x-custom-msg:(.*)"

match = re.match(pattern, input_string)

if match:
    result = match.group(1)
    print("匹配成功!")
    print("提取的内容:", result) # success
else:
    print("匹配失败")

匹配解析后的结果

# x-custom-code: "200"
# x-custom-msg: "success"
# x-custom-data: "{\"test\": \"data\"}"
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

备注: 采用HTTP2静态表中的字段user-agentserver,deepflow采集的效果稳定很多,但是对应的server代码要做修改处理,静态表字段并不符合协议标准和存在不安全性,看能否兼容动态表处理,兼容自定义http2 header的场景 @sharang

Use case

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

sharang commented 1 month ago

@Fancyki1 你提到的方法挺好的,相当于定义一个 http/grpc header injection 的规范,通过 value 的特殊性,在一个 value 中放进去所有需要 injection 的内容。

我们想想如何能在规范层面推进这种做法。

gbling commented 2 weeks ago

请问一下低于6.4的版本会有这个问题吗?

Fancyki1 commented 2 weeks ago

@gbling 文章来源都有:https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/ image 6.4之前都不支持这个功能

gbling commented 1 week ago

@Fancyki1 想再确认一下,HTTP1.1 协议的也会有同样的情况么?

Fancyki1 commented 1 week ago

@gbling http1.1 可以用wasm插件解析去实现,不需要用到这个特性

gbling commented 1 week ago

@Fancyki1 是这样的,我们在测试链路追踪的时候通过自定义的 http_log_x_request_id 做链路的关联,内部链路调用都是用 http1.1 ,会存在链路不全的情况;是想再明确一下这个特性是只对 HTTP2/gRPC 生效,还是 http1.1 也会生效的?

Fancyki1 commented 1 week ago

@gbling 你多看看文档,文档里面都写了

    ## Configuration to extract the customized header fields of HTTP, HTTP2, GRPC protocol etc
    #extra-log-fields:
    ## for example:
    ## http:
    ## - field-name: "user-agent"
    ## - field-name: "cookie"
    #  http: []
    #  http2: []

你用>v6.4版本,配置了http就启用了http1.1,而且http1.1不存在http2索引表的采集乱序不全的问题,直接用就好了,而且你要弄明白你要实现什么效果,如果是链路追踪那和这个没什么关系,如果想用这个看链路追踪是否每个请求都有http_log_x_request_id那倒是可以辅助排障使用