When we started stress testing, after deploying the deepflow agent in the cluster,
we found that the average response time of Java microservices decreased by three times,
from over 100 MS to 400 MS.
We have removed protocols that we may not use in the agent and tried to increase CPU and memory configurations, but it did not help
We even tried l4 Log Tap_ Types: -1, but still not helpful
We are using the latest version of 6.4.3 (agent/server). Is this the original performance of the agent?
Hello, there are some questions about this issue that need to be clarified:
What tool did you use for the benchmark? We have used wrk2 (https://github.com/giltene/wrk2), but found that this tool distorts RT under extreme TPS pressure.
Which testing method did you use: A) Fixed TPS, testing with and without running deepflow-agent; B) Testing maximum TPS in scenarios with and without running deepflow-agent. If it's the first method, please confirm that no single logical core ran at 100% during the test, as reaching 100% could likely cause a significant decay in RT. If it's the second method, it usually means that definitely a single core ran at 100%. We typically use the first method for testing and ensure that no single core runs at 100% while running deepflow-agent. For our testing method, please refer to: https://deepflow.io/blog/zh/030-deepflow-agent-ebpf-benchmark/
What was the TPS during the stress test?
During the testing process, how was the overall CPU usage and system load of the machine?
Can you confirm that your kernel version is 4.1.19? I'm concerned it might be a typo, as it looks like 4.19.
Search before asking
DeepFlow Component
Agent
What you expected to happen
our agent config file is like this:
vtap_group_id: g-e7f3f8db93 tap_interface_regex: .* process_threshold: 30 external_agent_http_proxy_enabled: 1 external_agent_http_proxy_port: 38086 static_config: ebpf: thread-num: 5 on-cpu-profile: disabled: true l7-protocol-enabled:
When we started stress testing, after deploying the deepflow agent in the cluster, we found that the average response time of Java microservices decreased by three times, from over 100 MS to 400 MS. We have removed protocols that we may not use in the agent and tried to increase CPU and memory configurations, but it did not help We even tried l4 Log Tap_ Types: -1, but still not helpful We are using the latest version of 6.4.3 (agent/server). Is this the original performance of the agent?
How to reproduce
No response
DeepFlow version
6.4.3
DeepFlow agent list
Daemonsets 7 pods
Kubernetes CNI
Antrea
Operation-System/Kernel version
4.1.19
Anything else
No response
Are you willing to submit a PR?
Code of Conduct