LinkedInAttic / naarad

Naarad is a highly configurable system analysis tool that parses and plots timeseries data for better visual correlation. Naarad was built to help in performance analysis and investigations.
Apache License 2.0
238 stars 62 forks source link

Aggregation Incorrect for QPS #337

Open richardhsu opened 9 years ago

richardhsu commented 9 years ago

If aggr_metrics=none is specified, the qps count is not correct. For instance, for the below example: It reports qps = 1 when it should be 4 and 1.

Also it only seems to plot second-level data points such that it only plots the last data points.

data.csv

1433409051000,30
1433409051200,10
1433409051300,30
1433409051400,16
1433409061000,30

naarad.conf

[Latency]
infile=data.csv
sep=,
columns=test
aggr_metrics=none

[GRAPH]
graphing_library=matplotlib
feng-tao commented 9 years ago

Attached an email I sent earlier:

Hi,

After I explore a little bit, I don’t think the combine of “aggr_metrics=none" and qps reported correctly >could be solved easily. If we want to disable sub-second aggregation, naarad will generate one record >per timestamp into csv file. Then the qps generation logic reads the file and consider each entry as >separate record. In this case this qps is no longer qps any more but query per subsecond which does not >have much meanings.

Either rewrite the qps logic which will take some time or just forget about using qps when you don’t want >to use second aggregation.

-Tao

zhenyun commented 9 years ago

CSV file is generated in correctly. We should fix the qps issue , otherwise it is not usable. :) I can put it as my next sprint story if nobody picks up ( a good excise for new hires though).

RiteshMaheshwari commented 9 years ago

@zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)

zhenyun commented 9 years ago

In my case, I did not specify anything. The metric is custom metric (not sar output). QPS (the csv file) is wrong.

On Fri, Aug 28, 2015 at 9:31 AM, Ritesh Maheshwari <notifications@github.com

wrote:

@zhenyun https://github.com/zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)

— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135825976.

Zen (Zhenyun Zhuang, Performance Team)

RiteshMaheshwari commented 9 years ago

Okay, feel free to pick up the investigation/story since you have the full context.

On Fri, Aug 28, 2015 at 9:40 AM, zhenyun notifications@github.com wrote:

In my case, I did not specify anything. The metric is custom metric (not sar output). QPS (the csv file) is wrong.

On Fri, Aug 28, 2015 at 9:31 AM, Ritesh Maheshwari < notifications@github.com

wrote:

@zhenyun https://github.com/zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)

— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135825976.

Zen (Zhenyun Zhuang, Performance Team)

— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135827922.

zhenyun commented 9 years ago

Sample data: cat q.csv 2015-08-25 00:00:00.0035,1 2015-08-25 00:00:01.0178,1 2015-08-25 00:00:01.0285,1 2015-08-25 00:00:01.0370,1 2015-08-25 00:00:01.0478,1 2015-08-25 00:00:01.0585,1 2015-08-25 00:00:01.0670,1 2015-08-25 00:00:01.0678,1 2015-08-25 00:00:02.0494,1 2015-08-25 00:00:02.0503,1 Config: naarad.q.conf [Query-QPS] infile=q.csv columns=query sep=,

[GRAPH] graphing_library=matplotlib Generated results: out/resources/Query-QPS.qps.csv 1440460800003,1.0 1440460801017,1.0 1440460801028,1.0 1440460801037,1.0 1440460801047,1.0 1440460801058,1.0 1440460801067,2.0

feng-tao commented 9 years ago

I could take a look at this bug next week if no one pick it up.

richardhsu commented 9 years ago

@zhenyun Can you take a look at this again, please install from the master branch as I used your CSV file and configuration and have this as my Query-QPS.qps.csv file:

1440460800000,1.0
1440460801000,7.0
1440460802000,2.0
richardhsu commented 9 years ago

I'm honestly confused how I was able to get the above output as any subsequent tries gave me an error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/utils.py", line 809, in parse_and_plot_single_metri
cs
    if metric.parse():
  File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/metrics/metric.py", line 309, in parse
    aggregate_timestamp, averaging_factor = self.get_aggregation_timestamp(ts, self.aggregation_granularity)
  File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/metrics/metric.py", line 193, in get_aggregation_ti
mestamp
    if granularity.lower() == 'none':
AttributeError: 'NoneType' object has no attribute 'lower'

I then went to add quickly to metric.py:

if granularity is None or granularity.lower() == 'none':
  return int(timestamp), 1
...

And then got the results that @zhenyun has. Thus we want if aggr_metrics=none is set then it just uses the timestamps to calculate QPS so if same timestamps then it aggregates. If we put aggr_metrics=second then it calculates correctly.

Questions:

  1. Does this mean if aggr_metrics is NOT defined then it should be based on the timestamps given OR based on seconds granularity?
  2. As @feng-tao has mentioned, if aggr_metrics=none then QPS will be incorrect. Do we want to assure that QPS is always granularity calculated by seconds? Or if aggr_metrics is set then it calculates an average of the QPS's based on aggregation? Or does it calculate as queries per minute if aggr_metrics=minute etc.
zhenyun commented 9 years ago

Feel the default should be 'second', after all, qps stands for query per second. That is, if no "aggr_metrics" line, use second;

richardhsu commented 9 years ago

@feng-tao We've found out that actually aggr_metrics is being used incorrectly. At some point we pushed a change in which aggregation_granularity = aggr_metrics and this is incorrect since granularity specifies second, minute, etc whereas aggr_metrics defines a set of metrics that should be aggregated.

If you could look at this sometime, that'd be great, it is based on this merge commit from your pull request: https://github.com/linkedin/naarad/commit/239c936b0140af9072abdad3439b06cc48168966.

The aggr_metrics should not be affecting the aggregation_granularity and it should be that you set aggregation_granularity=none if you don't want to aggregate (it'll aggregate by the timestamps given in the file) otherwise defaults to second for the standard aggregation across seconds.

feng-tao commented 9 years ago

@richardhsu Thanks. I will submit a pull request to fix this. In the mean time, I think by setting aggr_metrics=second is temp work around.