Open Fancyki1 opened 3 weeks ago
需求: deepflow v6.4版本实现了eBPF kprobe 高性能解码 HTTP2 压缩头,自动学习通信双方的压缩字典,但是在实际过程中采集自定义header存在丢失乱序覆盖的问题,希望使用只采集value去解决自定义头匹配的问题 文章来源:https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/ 缺陷:
问题描述: 对于可能存在压缩字典乱序的问题,导致采集内容key和value对应不上,实测效果
static_config: l7-protocol-advanced-features: extra-log-fields: http2: - field-name: "x-custom-code" - field-name: "x-custom-msg" - field-name: "x-custom-data"
发送一个http2/gRPC的请求
:authority: www.xxxx.com :method: POST :path: /list?aid=6383&sdk_version=5.1.18_zip&device_platform=web&zip=1 :scheme: https accept: */* accept-encoding: gzip, deflate, br, zstd accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6 content-encoding: gzip content-length: 5368 content-type: application/json; charset=utf-8 origin: https://www.xxxx.com priority: u=1, i referer: https://www.xxxx.com/ user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36 Edg/129.0.0.0 x-custom-code: 200 x-custom-msg: success x-custom-data: {"test": "data"}
技术原理:https://kiosk007.top/post/http-2-0-header-compression/ http2索引表包括:静态表rfc7541和动态表
Server代码落库位置:
deepflow\server\ingester\flow_log\log_data\l7_flow_log.go // AttributeNames = [] 数组 和 AttributeValues = [] 数组 // 映射关系是一对一 key=>value关系:AttributeNames[i]=>AttributeValues[i] h.AttributeNames = append(h.AttributeNames, l.ExtInfo.AttributeNames...) h.AttributeValues = append(h.AttributeValues, l.ExtInfo.AttributeValues...) h.MetricsNames = append(h.MetricsNames, l.ExtInfo.MetricsNames...) h.MetricsValues = append(h.MetricsValues, l.ExtInfo.MetricsValues...)
deepflow\server\ingester\flow_log\log_data\l7_flow_log.go
// AttributeNames = [] 数组 和 AttributeValues = [] 数组 // 映射关系是一对一 key=>value关系:AttributeNames[i]=>AttributeValues[i] h.AttributeNames = append(h.AttributeNames, l.ExtInfo.AttributeNames...) h.AttributeValues = append(h.AttributeValues, l.ExtInfo.AttributeValues...) h.MetricsNames = append(h.MetricsNames, l.ExtInfo.MetricsNames...) h.MetricsValues = append(h.MetricsValues, l.ExtInfo.MetricsValues...)
落库结果举例:
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
AttributeNames = ["rpc_services","x-custom-code","x-custom-code","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
AttributeNames = ["rpc_services","x-custom-data","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
**技术方案:** 技术思路:既然自学习HTTP2头解析索引表还是存在一些不足,不如从有特点的value入手通过配置进行补全 首先来一个通用简单的场景,分隔符处理,定义一个header
x-custom-content: "200!#!success!#!{\"test\": \"data\"}"
增加一个配置:这里有几个不同的方案,经过实测后
static_config: l7-protocol-advanced-features: extra-log-fields: http2:
由于特殊分隔符的情况较少,解析header时候可以被特殊分隔符分割且分割后的长度大于等于2的value,按照匹配规则和预定义的key进行补全。
补全后的结果和正常自学习header结果一致,效果稳定
场景补充:正则匹配处理(字段冗余思路)
# 定义的header key :x-custom-content,http2协议标准,动态表的一个字段,解析没有实际意义 # 特定字符串分隔符:!#! x-custom-code: "x-custom-code:200" x-custom-msg: "x-custom-msg:success" x-custom-data: "x-custom-data:{\"test\": \"data\"}" # 实际协议解析可能为: # unknown: "x-custom-code:200" # unknown: "x-custom-msg:success" # unknown: "x-custom-data:{\"test\": \"data\"}"
增加一个配置
static_config: l7-protocol-advanced-features: extra-log-fields: http2: - field-name: "x-custom-code" match-value-rule: "^x-custom-code:(.*)" field-value-index: 0 - field-name: "x-custom-msg" match-value-rule: "^x-custom-msg:(.*)" field-value-index: 0 - field-name: "x-custom-data" match-value-rule: "^x-custom-data:(.*)" field-value-index: 0
举例伪代码处理:
import re input_string = "x-custom-msg:success" pattern = r"^x-custom-msg:(.*)" match = re.match(pattern, input_string) if match: result = match.group(1) print("匹配成功!") print("提取的内容:", result) # success else: print("匹配失败")
匹配解析后的结果
# x-custom-code: "200" # x-custom-msg: "success" # x-custom-data: "{\"test\": \"data\"}" AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
备注: 采用HTTP2静态表中的字段user-agent和server,deepflow采集的效果稳定很多,但是对应的server代码要做修改处理,静态表字段并不符合协议标准和存在不安全性,看能否兼容动态表处理,兼容自定义http2 header的场景 @sharang
user-agent
server
No response
@Fancyki1 你提到的方法挺好的,相当于定义一个 http/grpc header injection 的规范,通过 value 的特殊性,在一个 value 中放进去所有需要 injection 的内容。
我们想想如何能在规范层面推进这种做法。
Search before asking
Description
需求: deepflow v6.4版本实现了eBPF kprobe 高性能解码 HTTP2 压缩头,自动学习通信双方的压缩字典,但是在实际过程中采集自定义header存在丢失乱序覆盖的问题,希望使用只采集value去解决自定义头匹配的问题 文章来源:https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/ 缺陷:
问题描述: 对于可能存在压缩字典乱序的问题,导致采集内容key和value对应不上,实测效果
发送一个http2/gRPC的请求
技术原理:https://kiosk007.top/post/http-2-0-header-compression/ http2索引表包括:静态表rfc7541和动态表
Server代码落库位置:
情况1:正常,少数
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
情况2:异常,大量
x-custome-msg 被 x-custome-code 覆盖,索引表解析乱序
AttributeNames = ["rpc_services","x-custom-code","x-custom-code","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
x-custome-code 被 x-custome-data 覆盖,索引表解析乱序
AttributeNames = ["rpc_services","x-custom-data","x-custom-msg","x-custom-data"] AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
定义的header key :x-custom-content,没有实际意义,如果wireshark和deepflow学习不到这个值的时候是unknown
特定字符串分隔符:!#!
x-custom-content: "200!#!success!#!{\"test\": \"data\"}"
实际协议解析可能为:unknown:"200!#!success!#!{\"test\": \"data\"}"
static_config: l7-protocol-advanced-features: extra-log-fields: http2:
由于特殊分隔符的情况较少,解析header时候可以被特殊分隔符分割且分割后的长度大于等于2的value,按照匹配规则和预定义的key进行补全。
补全后的结果和正常自学习header结果一致,效果稳定
场景补充:正则匹配处理(字段冗余思路)
增加一个配置
举例伪代码处理:
匹配解析后的结果
备注: 采用HTTP2静态表中的字段
user-agent
和server
,deepflow采集的效果稳定很多,但是对应的server代码要做修改处理,静态表字段并不符合协议标准和存在不安全性,看能否兼容动态表处理,兼容自定义http2 header的场景 @sharangUse case
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct