Artillery has a memory leak and uses an unreasonably high amount of RAM

Raynos commented 1 year ago

The ram went upto 8 gigabytes.

hassy commented 1 year ago

Thanks @Raynos. Can you share any more information that could help reproduce the issue? How long has the test been running for, any plugins or custom code being used?

Raynos commented 1 year ago

the test case is pretty small

scenarios:
  - name: 'Fetch total raised'
    flow:
      - post:
          headers:
            Content-Type: application/json
            X-Sound-Client-Key: '{{clientKey}}'
          url: '/graphql?op=total-raised'
          json:
            query: |
              query TotalRaisedPlatformLoadTest {
                totalRaisedPlatform {
                  usd
                  ethInWei
                }
              }
          capture:
            json: '$.data.totalRaisedPlatform.usd'
            as: 'totalRaisedUsd'

The config is as follows

config:
  target: 'https://example.com'
  phases:
    - duration: 120
      arrivalRate: 100
      rampTo: 1000
  variables:
    clientKey: '{{ $processEnvironment.SERVICE_API_KEY }}'
    environment: 'staging'
  plugins:
    expect: {}
    metrics-by-endpoint:
      useOnlyRequestNames: true
    publish-metrics:
      - type: datadog
        # DD_API_KEY is an environment variable containing the API key
        apiKey: '{{ $processEnvironment.DD_API_KEY }}'
        prefix: 'artillery.publish_metrics_plugin.'
        tags:
          - 'environment:{{environment}}'
          - 'service:core-api'
  processor: 'processor.js'
  http:
    extendedMetrics: true
  ensure:
    maxErrorRate: 1
    max: 500

hassy commented 1 year ago

Thanks for sharing the script @Raynos. We're looking at another potential memory issue right now (that one is to do with longer soak tests), and will try to reproduce the issue you're seeing as well.

bernardobridge commented 1 year ago

Hey @Raynos!

Thanks for the detailed reports, they were very helpful. As @hassy mentioned I have been looking into another memory leak. That was, however, unrelated to this (it was in Fargate, and has now been resolved). Thanks to replicating your setup, I also made a small improvement in the initial memory footprint of the Datadog reporter.

That being said, beyond that, I haven't found evidence of a memory leak in Artillery at this time. As Hassy mentioned in the other issue you opened (https://github.com/artilleryio/artillery/issues/1978), the high resource consumption should be a direct result of the high amount of TCP connections being opened in a short time - as Artillery models a real workload.

While we may make further improvements in the future, right now we recommend using loop or distributed load testing with Lambda or Fargate in situations like this, which is why we provide those as open source now!

Thanks!

Raynos commented 1 year ago

It should never use 9gigabytes of memory. that's the most obvouis memory leak I've ever seen in my life.

Raynos commented 1 year ago

i'd prefer if it committed suicide and failed with an OOM exception at some reasonable number like 2 or 4 gigabytes then continue going unbounded.

davidverholen commented 1 year ago

hm I got the same problem, with fargate + playwright in my case. I updated artillery to the latest version (2.0.0-36) after reading this issue.

The first 3 runs in the screenshot are with the default launch config, the last run I did overwrite the launch config to use the maximum memory setting (that fargate allows) of 12 GB, and still memory skyrockets to 100%

Bildschirmfoto 2023-08-23 um 15 27 55

UPDATE

actually, I just read that fargate updated the limits. I just had to change the hardcoded checks in ~/.nvm/versions/node/v16.20.1/lib/node_modules/artillery/node_modules/@artilleryio/platform-fargate/lib/commands/run- cluster.js:14 to allow also 30GB for 4 vCPU (will update as soon as I get the metrics for 30 GB RAM :P )

UPDATE 2 using 32 GB of RAM worked, but then the CPU was too low. changing it to 16 CPU with an arrival rate of one has worked. Even the Memory Consumption went down dramatically, once the CPU Load was lower (I guess because the scenarios finished faster and did not pile up). In the End, I am using 16CPU + 32GB RAM (because 16CPU lowest setting is 32GB RAM) and a very low arrival rate, and then scale with fargate task count to increase load

UPDATE 3 after some extensive testing, I realized that the maxVusers setting is the key for a cost efficient playwright + fargate setup. Setting maxVusers to 3 I was able to use the default fargate launch config again with 4CPU / 8GB RAM. This setup completed (in my case) 19 vusers, while the 16CPU / 32 GB completed 60 (while 19*4 =76 users completed with 4 small workers)

So in the end, the memory spikes only occured, when more scenarios / time are created then scenarios per time are finished. This piles up over time (especially if cpu load is >= 100). The maxVusers setting is the key here to control the load on the workers

final remark: Testing with 4CPU / 8GB RAM (artillery default launch config for fargate task workers) over 30 minutes with maxVusers=3 resulted in a very stable load of around 70 - 80% and a stable memory consumption of only around 10% of 8GB

compared to my previous setup with 16 CPU / 32 GB RAM (4 times the default launch config), using the default launch config with 4 CPU / 8 GB RAM seems to be roughly around 20% more cost efficient in AWS Fargate

artilleryio / artillery

Artillery has a memory leak and uses an unreasonably high amount of RAM #1979