artilleryio / artillery

The complete load testing platform. Everything you need for production-grade load tests. Serverless & distributed. Load test with Playwright. Load test HTTP APIs, GraphQL, WebSocket, and more. Use any Node.js module.
https://www.artillery.io
Mozilla Public License 2.0
7.95k stars 507 forks source link

Discrepancies with dynatrace publish metrics plugin #2308

Open Thilaknath opened 10 months ago

Thilaknath commented 10 months ago

Version info:

2.0.0-38

Running this command:

  sh "/home/node/artillery/bin/run version"
  sh "/home/node/artillery/bin/run run --output reports/${testName}.json tests/performance/${testName}.yml"
  sh "/home/node/artillery/bin/run report --output reports/${testName}.html reports/${testName}.json"

I expected to see this happen:

The metrics reported in the report.json to be in sync with the metrics reported to dynatrace so we could plot graphs

Instead, this happened:

Metrics are out of sync I am also noticing the following warnings and errors in the console. Warning

  plugin:publish-metrics:dynatrace Dynatrace reporter: WARNING Metric key 'plugins.metrics-by-endpoint./armadillo (armadillo).codes.200' does not meet Dynatrace Ingest API's requirements and will be dropped. More info in the docs (https://docs.art/reference/extensions/publish-metrics#dynatrace). +1ms
  plugin:publish-metrics:dynatrace Dynatrace reporter: WARNING Metric key 'plugins.metrics-by-endpoint./dino (dino).codes.200' does not meet Dynatrace Ingest API's requirements and will be dropped. More info in the docs (https://docs.art/reference/extensions/publish-metrics#dynatrace). +0ms
  plugin:publish-metrics:dynatrace Dynatrace reporter: WARNING Metric key 'plugins.metrics-by-endpoint./pony (pony).codes.200' does not meet Dynatrace Ingest API's requirements and will be dropped. More info in the docs (https://docs.art/reference/extensions/publish-metrics#dynatrace). +0ms
  plugin:publish-metrics:dynatrace Dynatrace reporter: WARNING Metric key 'plugins.metrics-by-endpoint.response_time./armadillo (armadillo)' does not meet Dynatrace Ingest API's requirements and will be dropped. More info in the docs (https://docs.art/reference/extensions/publish-metrics#dynatrace). +0ms
  plugin:publish-metrics:dynatrace Dynatrace reporter: WARNING Metric key 'plugins.metrics-by-endpoint.response_time./dino (dino)' does not meet Dynatrace Ingest API's requirements and will be dropped. More info in the docs (https://docs.art/reference/extensions/publish-metrics#dynatrace). +0ms
  plugin:publish-metrics:dynatrace Dynatrace reporter: WARNING Metric key 'plugins.metrics-by-endpoint.response_time./pony (pony)' does not meet Dynatrace Ingest API's requirements and will be dropped. More info in the docs (https://docs.art/reference/extensions/publish-metrics#dynatrace). +0ms
  plugin:publish-metrics:dynatrace Sending metrics to Dynatrace +0ms

Error

Apdex score: 0.6377952755905512 (poor)
⠧   plugin:publish-metrics:dynatrace Sending event to Dynatrace +165ms
  plugin:publish-metrics:dynatrace Cleaning up +1ms
  plugin:publish-metrics:dynatrace Waiting for pending request ... +0ms
⠸   plugin:publish-metrics:dynatrace There has been an error in sending metrics to Dynatrace:  HTTPError: Response code 400 (Bad Request)
    at Request.<anonymous> (/usr/local/lib/node_modules/artillery/node_modules/got/dist/source/as-promise/index.js:118:42)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  code: 'ERR_NON_2XX_3XX_RESPONSE',
  timings: {
    start: 1700595079319,
    socket: 1700595079319,
    lookup: 1700595079354,
    connect: 1700595079454,
    secureConnect: 1700595079560,
    upload: 1700595079560,
    response: 1700595079911,
    end: 1700595079911,
    error: undefined,
    abort: undefined,
    phases: {
      wait: 0,
      dns: 35,
      tcp: 100,
      tls: 106,
      request: 0,
      firstByte: 351,
      download: 0,
      total: 592
    }
  }
} +431

Files being used:

config:
  target: http://asciiart.artillery.io:8080
  phases:
    - duration: 15
      arrivalRate: 50
      rampTo: 100
      name: Warm up phase
    - duration: 10
      arrivalRate: 20
      rampTo: 50
      name: Ramp up load
#    - duration: 15
#      arrivalRate: 10
#      rampTo: 30
#      name: Spike phase
  plugins:
    ensure: {}
    apdex: {}
    metrics-by-endpoint: {}
    publish-metrics:
      - type: dynatrace
        # DY_API_TOKEN is an environment variable containing the API key
        apiToken: REPLACED-BY-PIPELINE
        envUrl: "https://apm.cf.company.dyna.ondemand.com/e/MASKED"
        prefix: "artillery."
        dimensions:
          - "service:ordService"
          - "test:crteOrd"
          - "host_id:1.2.3.4"
        event:
          title: "Loadtest"
          entitySelector: "type(SERVICE),entityName.equals(MyService)"
          properties:
            - "Tool:Artillery"
            - "Load per minute:100"
            - "Load pattern:development"
  apdex:
    threshold: 100
  ensure:
    thresholds:
      - http.response_time.p99: 100
      - http.response_time.p95: 75
  metrics-by-endpoint:
    useOnlyRequestNames: true

scenarios:
  - flow:
      - get:
          url: "/dino"
          name: dino
      - get:
          url: "/pony"
          name: pony
      - get:
          url: "/armadillo"
          name: armadillo

Also kindly note. I am publishing these metrics from a jenkins job. When i execute the tests from my local. The correct metrics seem to be reported to dynatrace. But when I run the job from jenkins. Something seems a miss. I have setup the job following artillery documentation.

Something to note: While executing from local, I run one test at a time. Metrics are published properly. When the jenkins job runs one test at a time as well metrics are correctly published. When there are more than 1 tests -> Thats where the issue seems to appear with one test reporting wrong metrics.

Kindly refer the screenshots from the jenkins console and also the screenshot showing the metric reported in dynatrace. Screenshot 2023-11-21 at 2 56 00 PM Screenshot 2023-11-21 at 2 58 55 PM

Thilaknath commented 10 months ago

@InesNi Kindly help with this.

InesNi commented 10 months ago

Hi @Thilaknath !

Thanks for the thorough report 🙏🏻

Warnings

Regarding the warnings, this is due to the metrics-by-endpoint characters. As you can see in the report, even though you've set useOnlyRequestNames, it's not actually using that in the metric names. The reason is that you've set it outside the config for plugins. Unlike apdex and ensure, the metrics-by-endpoint plugin must set its config under config.plugins.

So if you do it like this, it should work 🤞🏻 :

plugins:
   metrics-by-endpoint:
      useOnlyRequestNames: true

Metrics discrepancies in parallel runs

Regarding this, we'll investigate it on our side as soon as possible. In the meantime, I have some questions:

Thanks again!

Thilaknath commented 10 months ago

@InesNi Thank you for your response. I will modify my scripts to have metrics-by-endpoint: properly configured. Regarding the metrics to dynatrace.

1) I use the dimension which I set in my test script to filter the metrics for scenarios in dynatrace UI. As you can see from my script above. I set the following and in dynatrace I split the graph to show based on test

        dimensions:
          - "service:ordService"
          - "test:crteOrd"
          - "host_id:1.2.3.4"

From the image below. The purple line shows the correct metrics for one of my test and the yellow line is reporting wrong metrics.

Screenshot 2023-11-23 at 9 24 46 AM

2) Both the tests were sending around a total of 4500 Requests. 3) I am seeing less metrics for one of my tests when 2 tests are executed. Regarding your comment perhaps the 59 was from a single intermediate report? -> I don't think thats the case either as I tried to search the JSON manually for the value and did not find the metric reported from any of the intermediate reports. 4) Zooming in doesn't make any difference. 5) There is definitely not a delay in getting the metrics. As even after the test completion if I wait, I don't see the values as reported in the report.json . Regarding if there is some api rate limit in dynatrace. I will have to check with the central team. 6) Kindly let me know if there is any more information I can provide you

Thilaknath commented 10 months ago

Hello Team, Is it possible to get some update with this ticket ? @hassy Thank you

InesNi commented 10 months ago

Hi @Thilaknath,

Thank you for all the details.

We are still looking into this and will let you know as soon as we have more info. I do have a couple of questions:

Thilaknath commented 10 months ago

Hello @InesNi Yes there are rate limits and that follows along this specifications here: https://docs.dynatrace.com/docs/extend-dynatrace/extend-metrics#limits

Somemore findings from running the tests locally. Below you will screenshots from my terminal which shows the summary and also the metrics I see in dynatrace. (Note: This was for while two tests are running in parallel)

updateTestParallel

parallelRunDynatrace

From what you see below (We are interested in the yellow colour plots) -> The summary says there were around 6750 http.202.count for the test that was executed. But the graphs have different metrics.

Note: I tried running just one test and I noted that. For a single test there are 3 different data points in dynatrace. The sum of all the counts comes close to the summary shown in my terminal. But this messes up the visualization in dynatrace as you cannot see how a test degrades or improves over time with new features .

For a single run, Ideally it would be nice to have a single set of metrics published. Another screenshot showing the 3 plots in dynatrace

Screenshot 2023-11-30 at 9 44 00 AM

InesNi commented 10 months ago

Hi @Thilaknath 👋🏻

Apologies it took me a bit to look into this, I've been trying to replicate it using an actual Dynatrace setup 👍🏻 Here are my findings:

Regarding getting a single set of metrics published

The publish-metrics plugin (and its reporters) send metrics continuously to the platform while the test is running (called intermediate reports). This is done by design, as it allows you to visualise performance while a test is running, over the course of a test. That's particularly important for response_time metrics, as it allows you to visualise if they've gone up over the duration of the test, and correlate with other metrics (e.g. from your own system). For example:

Screenshot 2023-12-04 at 20 55 10

It might be good to use different visualisations for different types of data. For count type metrics, you might want to use the Single value or Top list visualisations, so you can obtain a single value. For example:

Screenshot 2023-12-04 at 20 48 52

Additionally, you want to make sure you are using attributes to slice the data in ways that make it easier to visualise. So things like unique ids (for example to group tests together), naming, commit shas, etc, can all help you visualise it better. We leave this sort of decision in how best to visualise the data up to the users, as we can't have expertise in all the observability platforms we support. But hopefully the above made sense and helped!


Regarding missing metrics

I have not been able to replicate this when running 2 tests in parallel. I think there's two things that could be at play here:


Implementing aggregated metrics only

If you still feel like you don't want intermediate reports, but only the aggregate values to make it to the platform, we could look into eventually implementing it. There has been another feature request for this for another reporter.

But again, you'll lose out on the information of what happened over the duration of the test.


Hope that helps!

Thilaknath commented 10 months ago

Thank you @InesNi for the detailed response. I will take a look to see how I can create better reporting in this case.