apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.9k stars 6.52k forks source link

java agent drop segment data #3699

Closed niyanchun closed 5 years ago

niyanchun commented 5 years ago

Continue the #3692 issue, as @wu-sheng reply, the oap server and storage(we use es) may be the bottleneck which case agent drop segment data. But in our test the oap server has a low cpu(below 30%), memory and disk are also sufficient. ES is also in low load. Is there any other approach to judge if oap server is the bottleneck? Thanks in advance.

wu-sheng commented 5 years ago

Why open a new one? If you are saying that, everything is sufficient, I can't tell more.

niyanchun commented 5 years ago

First, Sorry for opening a new one. I ever created an issue, I continued asking question after it was closed, but got no more reply any more. So I thought the closed issue would not be handled any more. As well as, in general, other github projects will only close the issue after the issue is resolved or get no more response from the author for a lone while.

Second, I am also feel strange that oap and es is not in high load( for the cpu, mem, disk are all in low usage) status but the agent drop data for buffer is full, and that's why I create an issue here. I wonder if the oap is the bottleneck, will it have a high cpu usage? Or other things that I can definitely know the oap is in high load status?

below is the oap server load in nmon:

image

niyanchun commented 5 years ago

we change storage from es to h2( so the storage should not bottleneck) , agent still drop data, OAP load is as below:

image

image

It seems consumer theads are idle in most time.

wu-sheng commented 5 years ago

H2 is only for demo only. There is the OAP backend observability by using Prometheus and Grafana. It is in the documentation, please read.

For the CPU usage, I am not sure. I am not working on this part yet.