CMPUT401Group / analytic-engine

Bleeding edge of analytic things and stuff
1 stars 0 forks source link

The 'search' option #4

Open deanbittner opened 7 years ago

deanbittner commented 7 years ago

I didn’t see the ‘search’ command line switch detailed in the —help. The synopsis says search all metrics for those correlated to the given. I was going to try something like —

node --max-old-space-size=4096 ./dist/cli.js search --metric1 invidi.webapp.localhost_localdomain.request.total_response_time.mean --m1_start 20:00_20161211 --m1_end 20:30_20161211 --normalisation

As I can best recall, this uses R to perform a similarity calculation between the given and each metric in graphite?

Would that work? What output might I expect?

MattDekinder commented 7 years ago

Yes, that will work: it correlates all metrics in the graphite database at the same time frame. To stdout it will output correlations as they happen, once completed it will create a grafana dashboard file in the directory where the command was run. That dashboard will display the top 30 linearly correlated metrics to the target.

Search does not normalize data, because it orders by linear correlation, which takes standard deviation into account (it will not complain about the flag though-- this is probably something I should add along with an explanation for search).

Here is a sample of this command: node dist/cli.js correlation --metric1 IN.stb-sim.dean.RequestTiming.count --m1_start 17:00_20160921 --m1_end 18:00_20160921 --dashboard-out /tmp/dashboard.json > /home/matt/search_output.txt

The output looks like this: Completed: 1 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsToKeepRequest.min value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 2 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsAvailable.mean value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 3 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsToKeepRequest.max value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 4 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsAvailable.percentile.75 value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 5 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsToKeepRequest.percentile.75 value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 6 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsAvailable.percentile.98 value Stored [Correlation, Covariance]: [ 'NA', 0 ]

I just realized that this output did not have the metric names so I added them just now. Don't worry about the 'NA' correlation, those metrics are just horizontal lines in the time frame specified. Because the covariance is 0, we can know these metrics just didn't record anything except 0 or 'NA' in intervals for this time frame.

Just a warning, this searching is a bit slow, I couldn't get mutlithreading it work with it properly in Node, I wasn't too familiar with it at the outset but I think I can manage it now. It should speed up a ton, though I suspect that trying to rewrite it in C or C++ would be far more efficient than NodeJS can be. It's designed for being a web server not for high performance computing so it's version of forking carries a lot of overhead.

deanbittner commented 7 years ago

Thanks a lot Matt. I will pull that and give it a go. Are you looking for any opportunities to do some hourly-oriented work on this?

On Dec 14, 2016, at 4:42 PM, MattDekinder notifications@github.com wrote:

Yes, that will work: it correlates all metrics in the graphite database at the same time frame. To stdout it will output correlations as they happen, once completed it will create a grafana dashboard file in the directory where the command was run. That dashboard will display the top 30 linearly correlated metrics to the target.

Search does not normalize data, because it orders by linear correlation, which takes standard deviation into account (it will not complain about the flag though-- this is probably something I should add along with an explanation for search).

Here is a sample of this command: node dist/cli.js correlation --metric1 IN.stb-sim.dean.RequestTiming.count --m1_start 17:00_20160921 --m1_end 18:00_20160921 > /home/matt/search_output.txt

The output looks like this: Completed: 1 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsToKeepRequest.min value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 2 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsAvailable.mean value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 3 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsToKeepRequest.max value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 4 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsAvailable.percentile.75 value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 5 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsToKeepRequest.percentile.75 value Stored [Correlation, Covariance]: [ 'NA', 0 ] Completed: 6 / 11168 Metric: IN.stb-sim.dean.call.connect.AdsAvailable.percentile.98 value Stored [Correlation, Covariance]: [ 'NA', 0 ]

I just realized that this output did not have the metric names so I added them just now. Don't worry about the 'NA' correlation, those metrics are just horizontal lines in the time frame specified. Because the covariance is 0, we can know these metrics just didn't record anything except 0 or 'NA' in intervals for this time frame.

Just a warning, this searching is a bit slow, I couldn't get mutlithreading it work with it properly in Node, I wasn't too familiar with it at the outset but I think I can manage it now. It should speed up a ton, though I suspect that trying to rewrite it in C or C++ would be far more efficient than NodeJS can be. It's designed for being a web server not for high performance computing so it's version of forking carries a lot of overhead.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CMPUT401Group/analytic-engine/issues/4#issuecomment-267191980, or mute the thread https://github.com/notifications/unsubscribe-auth/AK7laRJVt8RjbigKPH6XrBjJG-p9erHWks5rIH7lgaJpZM4LNEA4.

MattDekinder commented 7 years ago

I would be, definitely. But the addition to the help menu and some of the polishing to this searching process I will get to after my exams finish. Let me know if anything is not functioning as it should and I'll make sure to take a look at that too.

The dashboard output uses the same flag as the entailment-search so specifying a '--dashboard-out /dashboard.json' is the most assured way to make sure it's created. I only ran through a metric search to completion once after adding the dashboard-generating part since it took so long. I'm a bit nervous that there is some problem with it so I'll test that a bit on the weekend.