deepflowio / deepflow

:sparkles: Zero-code distributed tracing and profiling, observability via eBPF :rocket:
https://deepflow.io
Apache License 2.0
2.57k stars 282 forks source link

[BUG] 使用 capture_bpf 过滤dst类型的数据,无法实现预期过滤效果,并且agent日志出现异常 #6140

Open JiezengDev opened 2 months ago

JiezengDev commented 2 months ago

Search before asking

DeepFlow Component

Agent

What you expected to happen

期望

agent-group增加配置,期望过滤指定dst ip的bpf数据,以下IP都是集群外IP,未部署deepflow组件

capture_bpf:  not (dst host 10.218.4.10 or dst host 10.218.23.54 or dst host 10.218.23.222 or dst host 10.218.23.209)

实际情况

  1. clickhouse数据采集不符合预期
  2. agent日志报错:
    ERROR [src/dispatcher/mod.rs:541] Capture customized bpf(not (dst host 10.218.4.10 or dst host 10.218.23.54 or dst host 10.218.23.222 or dst host 10.218.23.209)) error, use default only.
    image

agent-group配置

vtap_group_id: g-d5fb660ed5
max_cpus: 2
max_memory: 2048
#l4_log_collect_nps_threshold: 50000
#l7_log_collect_nps_threshold: 50000
domains:
- e4ac8c34-5b37-5dd2-a1d3-3bbc8323c0d3
- 14d07276-7da4-56d4-bf31-847dd4ece446

## TraceID Keys
## Default: traceparent, sw8.
## Note: Used to extract the TraceID field in HTTP and RPC headers, supports filling
##   in multiple values separated by commas. This feature can be turned off by
##   setting it to empty.
http_log_trace_id: _catRootMessageId,traceparent,sw8

## SpanID Keys
## Default: traceparent, sw8.
## Note: Used to extract the SpanID field in HTTP and RPC headers, supports filling
##   in multiple values separated by commas. This feature can be turned off by
##   setting it to empty.
http_log_span_id: _catParentMessageId,traceparent,sw8

## Protocol Identification Maximun Packet Length
## Default: 1024. Range: [256, 8192]
## Note: The maximum data length used for application protocol identification,
##   note that the effective value is less than or equal to the value of
##   capture_packet_size.
l7_log_packet_size: 3000

## Maximum Sending Rate for l4_flow_log
## Default: 10000. Range: [100, [1000000]
## Note: The maximum number of rows of l4_flow_log sent per second, when the actual
##   number of rows exceeds this value, sampling is triggered.
l4_log_collect_nps_threshold: 50000

## Maximum Sending Rate for l7_flow_log
## Default: 10000. Range: [100, [1000000]
## Note: The maximum number of rows of l7_flow_log sent per second, when the actual
##   number of rows exceeds this value, sampling is triggered.
l7_log_collect_nps_threshold: 50000

capture_bpf:  not (dst host 10.218.4.10 or dst host 10.218.23.54 or dst host 10.218.23.222 or dst host 10.218.23.209)

static_config:
  l7-protocol-advanced-features:
    obfuscate-enabled-protocols: 
      - MySQL
      - PostgreSQL
      - Redis

How to reproduce

No response

DeepFlow version

agent version

Defaulted container "deepflow-agent" out of: deepflow-agent, configure-sysctl (init)
9760-b3f2758a3ad8a8b06ff221251a3ef1c80abbc6ff
Name: deepflow-agent community edition
Branch: v6.4
CommitId: b3f2758a3ad8a8b06ff221251a3ef1c80abbc6ff
RevCount: 9760
Compiler: rustc 1.75.0 (82e1608df 2023-12-21)
CompileTime: 2024-03-28 03:09:51

server version

2024/04/12 14:11:38 ENV K8S_NODE_NAME_FOR_DEEPFLOW=ucpcompute34-test-py-cloudvsp; K8S_NODE_IP_FOR_DEEPFLOW=10.218.2.43; K8S_POD_NAME_FOR_DEEPFLOW=deepflow-server-57f45b5df7-zdr7c; K8S_POD_IP_FOR_DEEPFLOW=10.218.44.115; K8S_NAMESPACE_FOR_DEEPFLOW=deepflow
Name: deepflow-server community edition
Branch: v6.4
CommitID: 2f352404119929699874cc3fce8d4b180222db09
RevCount: 9755
Compiler: go version go1.20.14 linux/amd64
CompileTime: 2024-03-27 07:31:49

DeepFlow agent list

image

Kubernetes CNI

calico :v3.25.1

Operation-System/Kernel version

CentOS Stream release 8 4.18.0-500.el8.x86_64

Anything else

No response

Are you willing to submit a PR?

Code of Conduct

yuanchaoa commented 2 months ago

我们已经有一个bug在追踪一个类似的问题:相同bpf语句agent有时成功有时失败

这个issue里的问题也有这个现象么 ? 所有的agent都报错么?

JiezengDev commented 2 months ago

我们已经有一个bug在追踪一个类似的问题:相同bpf语句agent有时成功有时失败

这个issue里的问题也有这个现象么 ? 所有的agent都报错么?

看日志不是所有的agent都报错 请问bpf语句成功会有什么日志,我再去检索对比下

zhangzujian commented 5 days ago

+1。稍微复杂的语句很容易就报错。用 tcpdump 或者 libpcap C 程序测试 bpf 语句,都是正确的,但是 agent 里会报错。