Closed Raynos closed 1 year ago
Thanks @Raynos. Can you share any more information that could help reproduce the issue? How long has the test been running for, any plugins or custom code being used?
the test case is pretty small
scenarios:
- name: 'Fetch total raised'
flow:
- post:
headers:
Content-Type: application/json
X-Sound-Client-Key: '{{clientKey}}'
url: '/graphql?op=total-raised'
json:
query: |
query TotalRaisedPlatformLoadTest {
totalRaisedPlatform {
usd
ethInWei
}
}
capture:
json: '$.data.totalRaisedPlatform.usd'
as: 'totalRaisedUsd'
The config is as follows
config:
target: 'https://example.com'
phases:
- duration: 120
arrivalRate: 100
rampTo: 1000
variables:
clientKey: '{{ $processEnvironment.SERVICE_API_KEY }}'
environment: 'staging'
plugins:
expect: {}
metrics-by-endpoint:
useOnlyRequestNames: true
publish-metrics:
- type: datadog
# DD_API_KEY is an environment variable containing the API key
apiKey: '{{ $processEnvironment.DD_API_KEY }}'
prefix: 'artillery.publish_metrics_plugin.'
tags:
- 'environment:{{environment}}'
- 'service:core-api'
processor: 'processor.js'
http:
extendedMetrics: true
ensure:
maxErrorRate: 1
max: 500
Thanks for sharing the script @Raynos. We're looking at another potential memory issue right now (that one is to do with longer soak tests), and will try to reproduce the issue you're seeing as well.
Hey @Raynos!
Thanks for the detailed reports, they were very helpful. As @hassy mentioned I have been looking into another memory leak. That was, however, unrelated to this (it was in Fargate, and has now been resolved). Thanks to replicating your setup, I also made a small improvement in the initial memory footprint of the Datadog reporter.
That being said, beyond that, I haven't found evidence of a memory leak in Artillery at this time. As Hassy mentioned in the other issue you opened (https://github.com/artilleryio/artillery/issues/1978), the high resource consumption should be a direct result of the high amount of TCP connections being opened in a short time - as Artillery models a real workload.
While we may make further improvements in the future, right now we recommend using loop
or distributed load testing with Lambda
or Fargate
in situations like this, which is why we provide those as open source now!
Thanks!
It should never use 9gigabytes of memory. that's the most obvouis memory leak I've ever seen in my life.
i'd prefer if it committed suicide and failed with an OOM exception at some reasonable number like 2 or 4 gigabytes then continue going unbounded.
hm I got the same problem, with fargate + playwright in my case. I updated artillery to the latest version (2.0.0-36) after reading this issue.
The first 3 runs in the screenshot are with the default launch config, the last run I did overwrite the launch config to use the maximum memory setting (that fargate allows) of 12 GB, and still memory skyrockets to 100%
UPDATE
actually, I just read that fargate updated the limits. I just had to change the hardcoded checks in ~/.nvm/versions/node/v16.20.1/lib/node_modules/artillery/node_modules/@artilleryio/platform-fargate/lib/commands/run- cluster.js:14
to allow also 30GB for 4 vCPU (will update as soon as I get the metrics for 30 GB RAM :P )
UPDATE 2 using 32 GB of RAM worked, but then the CPU was too low. changing it to 16 CPU with an arrival rate of one has worked. Even the Memory Consumption went down dramatically, once the CPU Load was lower (I guess because the scenarios finished faster and did not pile up). In the End, I am using 16CPU + 32GB RAM (because 16CPU lowest setting is 32GB RAM) and a very low arrival rate, and then scale with fargate task count to increase load
UPDATE 3 after some extensive testing, I realized that the maxVusers setting is the key for a cost efficient playwright + fargate setup. Setting maxVusers to 3 I was able to use the default fargate launch config again with 4CPU / 8GB RAM. This setup completed (in my case) 19 vusers, while the 16CPU / 32 GB completed 60 (while 19*4 =76 users completed with 4 small workers)
So in the end, the memory spikes only occured, when more scenarios / time are created then scenarios per time are finished. This piles up over time (especially if cpu load is >= 100). The maxVusers setting is the key here to control the load on the workers
final remark: Testing with 4CPU / 8GB RAM (artillery default launch config for fargate task workers) over 30 minutes with maxVusers=3 resulted in a very stable load of around 70 - 80% and a stable memory consumption of only around 10% of 8GB
compared to my previous setup with 16 CPU / 32 GB RAM (4 times the default launch config), using the default launch config with 4 CPU / 8 GB RAM seems to be roughly around 20% more cost efficient in AWS Fargate
The ram went upto 8 gigabytes.