Closed yuecchen closed 5 years ago
Here is the log, if you skip to the end of the file, the duplication is more and more. collect.log here is sample app: acmeair-demo.zip
And it is run by appmetrics 4.0.1
To clarify, Hui says "this is not a regression, according to Yue's testing, the problem exists in 4.01"
The log file does show approximately 30 entries per second, which matches the jmeter load, and there are a large number of duplicate data entries. We think it is a problem with the appmetrics http probe, but we are still investigating...
This might be an issue with the way JMeter issues regulates request throughput - I've just had this running using curl (eyeballing around 700 requests per min) and didn't see duplicates. Attempting to up the throughput using a more powerful machine than my Linux VM.
I've now run the test through Jmeter on a more powerful machine, and while I am getting identical timings and durations, I can prove that appmetrics is not losing requests. I added a Counter in Jmeter and put the value of that into each request as a header. I then altered app.js to output the data.requestHeader and the resultant log shows that one and only one event was output per request, with no requests missed. It remains a concern that several events have exactly the same timestamp and duration however, so I will continue investigating. Attached: Jmeter Plan and output log. output.log MCDefault.zip
The problem exists because of the level of stored 'context' is too high. When we instrument the http functions we create a context object that is stored at the function level. The idea is that we use this object to store the start and end times of each request as it comes through the probe and use that to calculate our duration. However, because this context object is stored at the function level rather than at the request level, if a new request comes before the previous one has completed then it resets the timing data. This leads to large numbers of small groups of requests sharing the same timing data when a lot of data is forced through, causing the original problem. The solution is to move the context object down to the request level where it was always meant to be. This would have a high impact for all probes as we'd be moving it out of a common area into the individual probes, and this fix is required urgently. Therefore I'll raise another issue to take care of the main work, and deliver a fix solely for the http- and https- probes that will ignore the overall context and use a local one instead. This radically decreases the size of the immediate change so that the end users can be confident that we're not drastically changing the framework.
@mattcolegate Thank you for the quick investigation and finding out the root cause.
@mattcolegate Could you please help provide a test driver of the fix, so that we have have a quick verify.
I tested this last night myself and confirmed it worked, will release today
PR has been merged; closing issue
Steps to recreate the problem
The sum count of the requests is correct, so, the duplication also means lose real request.
The duplication become more and more, when the application runs for a while, here is a chart for reference:
With this chart, the traffic is 2000 per minute, and the y-axis value is the count of the distinguish request group by minutes.