ensure.maxErrorRate check returns with "fail", even though no requests or `plugins.expect` failures shown

NoelLH commented 10 months ago

We have a test which includes a general ensure setting intended to fail the test only on errors. But it seems like in some cases it is "fail"-ing it even when there are no failed expectations or requests with errors.

Our test includes:

config:
  ...
  ensure:
    maxErrorRate: 0.1 # Higher than 0.1% error rate -> error code on exit
  plugins:
    expect: {}

Version info:

Artillery: 2.0.0-38
Node.js:   v20.9.0
OS:        linux

Running this command:

artillery run-fargate --count 3 --region eu-central-1 --output reports/the-output.json the-test.yaml

I expected to see this happen:

An "ok" from the plugin checks.

Instead, this happened:

All VUs finished. Total time: 4 minutes, 57 seconds

--------------------------------
Summary report @ 10:08:16(+0000)
--------------------------------

http.codes.200: ................................................................ 13448
http.codes.201: ................................................................ 1350
http.downloaded_bytes: ......................................................... 0
http.request_rate: ............................................................. 52/sec
http.requests: ................................................................. 14817
http.response_time:
  min: ......................................................................... 0
  max: ......................................................................... 5684
  mean: ........................................................................ 554.7
  median: ...................................................................... 314.2
  p95: ......................................................................... 2101.1
  p99: ......................................................................... 2725
http.responses: ................................................................ 14798
plugins.expect.ok: ............................................................. 21500
plugins.expect.ok.contentType: ................................................. 5358
plugins.expect.ok.equals: ...................................................... 1344
plugins.expect.ok.statusCode: .................................................. 14798
vusers.completed: .............................................................. 1316
vusers.created: ................................................................ 1350
vusers.created_by_name.Make a donation: ........................................ 1350
vusers.failed: ................................................................. 0
vusers.session_length:
  min: ......................................................................... 15783.5
  max: ......................................................................... 22212.3
  mean: ........................................................................ 17362.6
  median: ...................................................................... 17158.9
  p95: ......................................................................... 18588.1
  p99: ......................................................................... 19737.6
Test run completed: ttjwt_c65cey4kjx4caqc3fxhmdpenx4g4d_gaha [10:08:16]

Checks:
fail: maxErrorRate < 0.1

Files being used:

Partial config as above. I can share a full test script privately as needed.

bernardobridge commented 10 months ago

Hi @NoelLH 👋 . The ensure check for maxErrorRate does the following calculation:

((vusers.created - vusers.completed)/vusers.created * 100) <= ${maxErrorRate}

In your report, it seems that the vusers.created were 1350 and vusers.completed were 1316, resulting in an error rate (as per the above calculation) of around 2.5%, so it's failing due to that.

From the limited report I'm not sure why there were some vusers that didn't complete though. In your intermediate reports or logs, can you find any evidence of that ? Are there any vusers.skipped, perhaps?

NoelLH commented 10 months ago

Hey @bernardobridge, good spot on the completed count, thanks. I'm confused as to the "why" though. Everything seemed to hold up well on the app side and the JSON with the intermediate reports contains no skipped keys.

I'm attaching that JSON in case it contains any more clues I've missed.

2023-11-06-1000-e2e.json

hassy commented 10 months ago

http.codes.200 + http.codes.201 adds up to 14,798 which is the same as http.responses as expected.

http.requests is 14,817 though, whereas we'd expect it to equal http.responses in the absence of any errors. So 14817 - 14798 = 19 requests are not accounted for.

If those 19 requests resulted in errors though we'd expect the difference between created and completed VUs to be 19, but it's 1350 - 1316 = 24. 🤔

Something might've gone wrong with the aggregation of metrics - @NoelLH was this a distributed run on Lambda/Fargate or a local one?

hassy commented 10 months ago

@NoelLH happy to take a look at the full test script as well - hveldstra@artillery.io :)

NoelLH commented 10 months ago

Thanks @hassy – it was a distributed Fargate run, so something going wrong with the aggregation sounds likely. I'll send over the test script.

artilleryio / artillery

ensure.maxErrorRate check returns with "fail", even though no requests or `plugins.expect` failures shown #2277