distributed-system-analysis / pbench

A benchmarking and performance analysis framework
http://distributed-system-analysis.github.io/pbench/
GNU General Public License v3.0
185 stars 108 forks source link

Very large values (2,937,379,629,571,585.00) in pidstat cpu_usage for two processes /opt/cni/bin/openshift-sdn #799

Open wabouhamad opened 6 years ago

wabouhamad commented 6 years ago

While running pbench-ansible from a pbench-controller host on an OCP 3.9 cluster, pidstat cpu_usage on one of the compute nodes "svt_node_4:ip-172-31-55-116" shows two processes /opt/cni/bin/openshift-sdn with a huge value for average: 2,937,379,629,571,585.00.

The csv data shows several processes for /opt/cni/bin/openshift-sdn and they to all seem to have zero values.

This appears to be a pidstat postprocessing issue.

The data are on the pbench server in the following directory:

/EC2::ip-172-31-40-62/ocp-3.9.14-rpm-docker1.13.1-500ppn-pausepods-kern-310-861-m4XL_160/tools-default/svt_node_4:ip-172-31-55-116/pidstat/cpu_usage.html The second compute node does not show this: /results/EC2::ip-172-31-40-62/ocp-3.9.14-rpm-docker1.13.1-500ppn-pausepods-kern-310-861-m4XL_160/tools-default/svt_node_3:ip-172-31-24-150/pidstat/cpu_usage.html
ndokos commented 6 years ago

Here's a link: http://pbench.perf.lab.eng.bos.redhat.com/results/EC2::ip-172-31-40-62/ocp-3.9.14-rpm-docker1.13.1-500ppn-pausepods-kern-310-861-m4XL_160/tools-default/svt_node_4:ip-172-31-55-116/pidstat/

Actually, the csv data also show a (different) large value: line 582, column 3144:1844674407370955264.00 for pid 54288 and line 588, column 3898: 1844674407370955264.00 for pid 67070.

And I see that last value in the pidstat stdout file:

629584: 1522193470     0     54288    0.00 1844674407370955264.00    0.00 1844674407370955264.00     0 1844674407370955264.00      0.00  114380    3552   0.02      0.00      0.00      0.00       0 1844674407370955264.00 1844674407370955264.00  /opt/cni/bin/openshift-sdn 
635438: 1522193530     0     67070    0.00 1844674407370955264.00    0.00 1844674407370955264.00     1 1844674407370955264.00      0.00  188368    3560   0.02      0.00      0.00      0.00       0 1844674407370955264.00 1844674407370955264.00  /opt/cni/bin/openshift-sdn

which likely makes it a pidstat bug (@sjug wins the prize!)

sjug commented 6 years ago

What do I win?

ndokos commented 6 years ago

@sjug: you get to fix pidstat.

sjug commented 6 years ago

I'm on it! Time to brush up on my C.

The issue is pidstat is coming up with these massive values? Why are they also different than the result graphical data?

ndokos commented 6 years ago

Yes, that is the issue. Not sure what you mean in the second question?

jeremyeder commented 6 years ago

there is something systemically wrong - we've seen this in tools like iostat, too. it could even be a kernel issue.