Closed NikCanvin closed 4 years ago
I think this is a bug, becuase the performance feature promises to help the user detect performance changes between code changes -- however the data is broken due to codewind architectural behaviour/deisgn -- so the feature does not do what the user expects.. .. I realise though that it's a big change, so we consider it more 'feature-like' to address
To @tobespc .. .. following 0.7.0 release, with a new metrics architecture ... I retested the performance feature.
Unfortunely, the same issue here is seen:
Identicle runs causes the performace to degrade (despite no code changes).. an App restart restores the performance, until multi further runs cause degrading again!
The metrics collections are sampled and recorded by AppMetrics which runs within the project container. Codewind PFE tells the project to start and stop collecting metrics using a few different endpoints :
There should be one collection per load run which will include summary data similar to :
json{
"id": 2,
"time": {
"data": {
"start": 1580131139857,
"end": 1580131165304
},
"units": {
"start": "UNIX time (ms)",
"end": "UNIX time (ms)"
}
},
"cpu": {
"data": {
"systemMean": 0.0429953625,
"systemPeak": 0.0532637,
"processMean": 0.0016209041249999998,
"processPeak": 0.0037047
},
......
}
Each summary contains a start timestamp and an end snapshot and is the duration of the recording between when recorded started and ended.
"start": 1580131139857, "end": 1580131165304
Ideally, load is run against the project during the time the metrics are being recorded. Currently Codewind PFE applies load pressure based on project specific properties for example 60 seconds and that should coincide with the duration of the metrics recorder eg:
Start recording metrics
Start load test time for 60 seconds
Apply load
Apply load
Apply load
Apply load
Apply load
Apply load
Load timer expires
Stop recording metrics
You might expect the metrics to only be collected for 60 seconds however that may not always be the case and I think that's part of the bug.
What is actually happening is :
Start recording metrics
Start load test time for 60 seconds
Apply load
Apply load
Apply load
Apply load
Apply load
Apply load
Load timer expires
Apply load <-- lots of outstanding inflight load requests
Stop recording metrics
Under extreme load, the project is still busy trying to handle and provide a response to the inflight requests from loadrunner. It may not process the request to turn off the metrics recorder for some time after the load run has finished. That means we can not guarantee that the metrics summary is for only the requests captured during a specific time window since some project URLs may continue to stream into the container and be measured until the collection STOP is received.
One way around this would be to have AppMetrics turn ON and turn OFF the metrics collection within the project container itself rather than be told to. If AppMetrics started a timer at the point where recording started, it would keep recording until that timer expired regardless of load ending late. When Codewind then asks for the metrics it would retrieve the recorded snapshot just for that time window. This would be a change to AppMetrics, JavaMetrics and SwiftMetrics but it would get us closer to measure load for x minutes which we currently do not do.
I'm not saying that will solve the entire issue about worsening performance after each load run, but we have to at least get to a point where the expected run duration is consistent and not left open to outside influences.
Ran a test script to check behaviour based on above :
results :
{"id":3,"time":{"data":{"start":1580135082273,"end":1580135107995},.....
Duration = 1580135107995 - 1580135082273 = 25.7 seconds
The 25 seconds includes the 15 of sleep which is where any number of requests could still arrive in the project and skew the summaries.
Made some changes to app metrics and now can get the timed collection to expire within a few milliseconds of a timed capture.
Work completed so far :
New endpoints for controlling metrics :
for node metricsName = appmetrics for java metricsName = javametrics
GET http://{project}/{metricsName}/api/v1/collections
POST http://{project}/{metricsName}/api/v1/collections/{timeInSeconds}
GET http://{project}/{metricsName}/api/v1/collections/{collectionID}/stashed
/close
Codewind version: 0.5.0 OS: MacOs
Che version: na IDE extension version: 0.5.0 IDE version: VSCode Kubernetes cluster: na
Description: As part of a perf BLOG in progress here: https://github.ibm.com/dev-ex/devAdvocacy/issues/147 ... I have tried to use the PerformanceDashboard for 2 days, to demo a perf enhancement in Node.js v13 (from v12). .. When I load the microservice up, not all the requests hit the microservice as expected, instead loadrunner buffers the requests until the microservice is free to process them. The microservice response time is technically correct, however, I'd expect all the requests to be fired at the microservice per my parameters (in edit load swettings) and the actual reponse time to be much longer. I think this is a BUG.
I suspect we also need to add new features:
Steps to reproduce:
Workaround: