Open kwenzh opened 1 year ago
Hello, eBPF does have some performance overhead. I talked about it in my last live broadcast. We're currently sorting through the data, and we'll have the first version of the data publicly available soon. Also, can you close eBPF or eBPF uprobe to add wechat at the bottom of readme? Let's communicate on wechat.
close eBPF or eBPF
ok thank you. Will closing the eBPF probe have any effect? Like network topology monitoring capabilities?
Network topology is not affected, but distributed tracing
https://mp.weixin.qq.com/s/oNrTG4ExNOvwV6luPaC4zA
We have some performance test data for reference, and you can also try to turn off the uprobe test of eBPF only @kwenzh
Hey guys,
I was doing some performance tests and I think deepflow agent is impacting the throughput and requests per second on a K8S cluster. How thet teste was done: I used a POD running a K6 script from the K8S cluster where is running to a nginx server running on a VM. All tests running Deepflow 6.3.5.
This is the result WITH the deepflow agent running As the image shows the script peaked on 11.37 req/s
This is result WITHOUT the deepflow agente running of the same script on the same cluster. I only deleted the daemon set. As the you can see, the test was able to reach 19.1k tests
I run this test twice, with and without the agent and the results were the same, very similar request per second. I am running it now for the third time I will post the results here in a few moments.
With agent running on K8S - test redo
Without agent running on K8S - test redo
With agent running on K8S - test redo 2
### Last 3 tests compared**** Here is a overview of the last 3 tests, as we can see, the response time increase considerably when the deepflow agent was running
I hope I could help.
Here is a overview of the last 3 tests, as we can see, the response time increase considerably when the deepflow agent was running
yes, the same to you , I tried to adjust the deepflow agent parameters, then a little better, maybe you can try it https://deepflow.io/docs/zh/install/advanced-config/agent-advanced-config/
vtap_group_id: g-d32cd8e4ef
capture_packet_size: 2048
static_config:
ebpf:
disabled: true
@dirtyren Alternatively, you can try shutting down the eBPF uprobe and testing again
vtap_group_id: g-d32cd8e4ef
capture_packet_size: 2048
static_config:
ebpf:
uprobe-process-name-regexs:
golang-symbol: ""
golang: ""
openssl: ""
@dirtyren Alternatively, you can try shutting down the eBPF uprobe and testing again
vtap_group_id: g-d32cd8e4ef capture_packet_size: 2048 static_config: ebpf: uprobe-process-name-regexs: golang-symbol: "" golang: "" openssl: ""
I did this config usingn deepflow-ctl but the dashboards are still showing eBPF sources in the last 5 minutes and the performance test yield the same results
vtap_group_id: g-3c66e436c9
log_level: ERROR
tap_interface_regex: '^(tap.*|gke.*|cali.*|veth.*|eth.*|en[ospx].*|lxc.*|lo|[0-9a-f]+_h)$'
external_agent_http_proxy_enabled: 1 # required
external_agent_http_proxy_port: 38086 # optional, default 38086
capture_packet_size: 2048
static_config:
ebpf:
uprobe-process-name-regexs:
golang-symbol: ""
golang: ""
openssl: ""
I think the capture_packet_size: 2048 solved my problem, the metrics are very similar with or without the deepflow-agent running
I think the capture_packet_size: 2048 solved my problem, the metrics are very similar with or without the deepflow-agent running
yes, adjust capture_packet_size: 2048
have helps
Search before asking
DeepFlow Component
Agent
What you expected to happen
deploy deepflow in k8s cluster, We have found a performance degradation in the programs within the cluster, including delays, QPS, and mainly some HTTP services and consumer mq tasks. Performance has decreased by about 40%.
How to reproduce
make a test case
Then I did a simple test, starting an http api and testing it with ab tools demo code:
client test eg:
ab -n 2000 -c 10 http://k8s-ip:nodeport/get_num?num=22
no deepflow-agent result:
deeoflow-agent running:
It looks like qps has decreased by 25%. I know deepflow-agent uses ebpf technology
DeepFlow version
deepflow version: v6.2.6 kerne version: 5.15.72 k8s version: v1.18.19
DeepFlow agent list
k8s cluster, echo node have a deepflow-agent pod deepflow-ctl agent list VTAP_ID NAME TYPE CTRL_IP CTRL_MAC STATE GROUP EXCEPTIONS 2 dev-szdl-k8s-slave-5.novalocal-V9 K8S_VM 10.x fe:fc:fe:03:45:ae NORMAL default 3 dev-szdl-k8s-slave-7.novalocal-V1 K8S_VM 10.x fe:fc:fe:68:15:78 NORMAL default 4 dev-szdl-k8s-slave-6.novalocal-V10 K8S_VM 10x fe:fc:fe:5c:b5:0f NORMAL default 5 dev-szdl-k8s-slave-1.novalocal-V7 K8S_VM 10.x fe:fc:fe:71:51:77 NORMAL default 6 dev-szdl-k8s-slave-4.novalocal-V2 K8S_VM 10.x fe:fc:fe:59:5f:6a NORMAL default 7 dev-szdl-k8s-slave-3.novalocal-V3 K8S_VM 10.x fe:fc:fe:6f:fa:eb NORMAL default 8 dev-szdl-k8s-slave-2.novalocal-V8 K8S_VM 10.x fe:fc:fe:43:0a:0b NORMAL default
Kubernetes CNI
calico
Operation-System/Kernel version
from 5.15.72 5.15.72-1.sdc.el7.elrepo.x86_64
Anything else
I know deepflow-agent uses ebpf technology
so I would like to confirm whether deepflow-agent will affect the linux kernel network forwarding performance or CPU performance of programs between clusters. For example, cpu scheduling and network forwarding
and the other test , there is a http post api, compared running DeepFlow-agent to not running in the test When DeepFlow-agent is running, the QPS drops from 5000+ to 2000.+ , it almost -50%
Are you willing to submit a PR?
Code of Conduct