RuntimeTools / appmetrics

Node Application Metrics provides a foundational infrastructure for collecting resource and performance monitoring data for Node.js-based applications.
https://developer.ibm.com/open/node-application-metrics/
Apache License 2.0
977 stars 126 forks source link

data for 'http' event is contains duplicated request #590

Closed yuecchen closed 5 years ago

yuecchen commented 5 years ago

Steps to recreate the problem

  1. With appmetrics v5.0.0, add the following code in application:
var appmetrics=require('appmetrics');
var monitor = appmetrics.monitor();
monitor.on('http', function(data){
console.info('joycelog:',data.time, data.duration, data.url, data.statusCode);
});
  1. Add a traffic of 2000 per minute with jmeter:
  2. collect the log, and you will see many duplicated requests:

image

The sum count of the requests is correct, so, the duplication also means lose real request.

The duplication become more and more, when the application runs for a while, here is a chart for reference:

image With this chart, the traffic is 2000 per minute, and the y-axis value is the count of the distinguish request group by minutes.

yuecchen commented 5 years ago

Here is the log, if you skip to the end of the file, the duplication is more and more. collect.log here is sample app: acmeair-demo.zip

And it is run by appmetrics 4.0.1

rnchamberlain commented 5 years ago

To clarify, Hui says "this is not a regression, according to Yue's testing, the problem exists in 4.01"

rnchamberlain commented 5 years ago

The log file does show approximately 30 entries per second, which matches the jmeter load, and there are a large number of duplicate data entries. We think it is a problem with the appmetrics http probe, but we are still investigating...

mattcolegate commented 5 years ago

This might be an issue with the way JMeter issues regulates request throughput - I've just had this running using curl (eyeballing around 700 requests per min) and didn't see duplicates. Attempting to up the throughput using a more powerful machine than my Linux VM.

mattcolegate commented 5 years ago

I've now run the test through Jmeter on a more powerful machine, and while I am getting identical timings and durations, I can prove that appmetrics is not losing requests. I added a Counter in Jmeter and put the value of that into each request as a header. I then altered app.js to output the data.requestHeader and the resultant log shows that one and only one event was output per request, with no requests missed. It remains a concern that several events have exactly the same timestamp and duration however, so I will continue investigating. Attached: Jmeter Plan and output log. output.log MCDefault.zip

mattcolegate commented 5 years ago

The problem exists because of the level of stored 'context' is too high. When we instrument the http functions we create a context object that is stored at the function level. The idea is that we use this object to store the start and end times of each request as it comes through the probe and use that to calculate our duration. However, because this context object is stored at the function level rather than at the request level, if a new request comes before the previous one has completed then it resets the timing data. This leads to large numbers of small groups of requests sharing the same timing data when a lot of data is forced through, causing the original problem. The solution is to move the context object down to the request level where it was always meant to be. This would have a high impact for all probes as we'd be moving it out of a common area into the individual probes, and this fix is required urgently. Therefore I'll raise another issue to take care of the main work, and deliver a fix solely for the http- and https- probes that will ignore the overall context and use a local one instead. This radically decreases the size of the immediate change so that the end users can be confident that we're not drastically changing the framework.

yuecchen commented 5 years ago

@mattcolegate Thank you for the quick investigation and finding out the root cause.

yuecchen commented 5 years ago

@mattcolegate Could you please help provide a test driver of the fix, so that we have have a quick verify.

tobespc commented 5 years ago

I tested this last night myself and confirmed it worked, will release today

mattcolegate commented 5 years ago

PR has been merged; closing issue