RunWith-IT Analytic Engine
This package is the Analytic Engine in the RunWith-IT stack.
Requirements
- r-statistics
- node 4.6.2
- npm (should be installed automatically when node is installed).
- gulp (do via
npm install -g gulp
)
- For testing,
npm install -g jasmine
npm install -g node-gyp
sudo apt-get install libkrb5-dev libgssapi-krb5-2
npm install -g fibers
- Install mongodb
Instructions
npm install -g gulp
npm install
gulp # This builds the src to the dist directory
cd dist
node r-adapter.js # This is a test file and for testing r-statistics with node.
- 'node dist/long-running-process-test.js' #this will test comparing all metrics, which takes about 2 hours depending on hardware and internet bandwidth
Testing
Tests utilize the Jasmine test framework. They should all be placed in spec/analytic-engine directory. Note that we do not test any async methods with Jasmine as the famework seems to have issues with multithreading tests
npm install
npm test
Structure
- r-modules/ - Contains *.R files which is called by the javascript files in src/ directory.
- src/ - Javascript files. We use ES6 since it is awesome.
- dist/ - Doesn't exist at first until
gulp
is executed. This contains the "compiled" .js files
- spec/ - Contains unit test directory.
How To:
CRON TASK
using the following api call, setting the date and metric parameters to return a list of deviant datapoints is possible.
http://162.246.157.107:8888/call?mdate1=1474110000&mdate2=1474111800&m1=invidi.webapp.localhost_localdomain.request.total_response_time.mean&m2=invidi.webapp.localhost_localdomain.database.request.findEtl.error_gauge&func=2
There is also a ui at:
http://runwithittest.azurewebsites.net/
which uses this api to compare metrics and find deviant points.
TODO:
- we need more robust interpolation of data points (currently, I think we might miss out on local minima and maxima in a dataset which could skew the results of covarance and correlation analysis)
- we need to make sure that when comapring sets of datapoints we comapre points which have the same time spacing. If there are differing intervals or gaps in a metric, we need to represent that in the number of datapoints for that metric (currently we assume that we are always comapring the same span of time and we simply interpolate more points in one of the metrics to match the other. We always create interpolated sets with even spacing in the timeframe and we need to ensure that is the case for the other metric as well. Possibly this means that we should interpolate both metrics, but I think we need to address whether interpolating is causing the data to lose possible points of interest which line up in time to points in the other data set anyway)
- we need to create the api to talk to a front end of some kind. This api just needs to call certain methods or functions which are performing anaysis.
- we need to save the results of anaysis in the database in case the program terminate
- we need a way to convert that saved output into a JSON grafana dashboard (ex. top 20 most correlated metrics to the search should create a dashboard with those metrics ordered on the page)