Default stress tests - Githubissues

maethor commented 1 year ago

For the first energizta.sh, my default stress-test is too basic :

sleep 120
stress-ng -q --cpu 1
stress-ng -q --cpu 4
stress-ng -q --cpu 8

We need to check other cases, memory, disk IO, maybe other type of CPU usage. stress-ng should be able to manage all of this. Any help and advice would be appreciated.

But we also need to limit the number of tests. By default we run each test for one minute. 20 secondes waiting for warmup, and 40 seconds of measures. This can be discussed. But we should be careful that the full test doesn't run for 2 hours. I think we should limit the number of tests to 10 or 15. What do you think?

github-benjamin-davy commented 1 year ago

Hello @maethor what we did with turbostress was focusing on stressing CPU and RAM but we used different options: --cpu-load that let's define a CPU target load for the test which is nice because usual stress tests are more all or nothing --ipsec-mb, a stress test that performs cryptographic processing using advanced instructions like AVX-512 (test called ipsec later). We wanted to observe the impact of such instructions on power consumption. --vm, a test that specifically performs memory stress methods so that we can observe the impact of memory-intensive workloads --maximize, in this test stress-ng will launch different types of stressors (CPU, cache, memory, file) and set these to the maximum settings allowed in order to get an estimate of a worst-case scenario => there is an even worse setting that caused the VM to crash not sure it's needed because we want to have an estimation of the consumption in normal working conditions anyway

There are specific disk network stressing modes we didn't investigate.

What would be interesting is to see with a wattmeter how these different settings actually change the measurement and what's the main driver (supposedly CPU - Memory for hardware that doesn't include GPUs).

da-ekchajzer commented 1 year ago

With stress-ng

Looking at stress-ng options

I/O

 --iomix N
              start N workers that  perform  a  mix  of  sequential,  random  and  memory  mapped
              read/write  operations  as  well  as  forced  sync'ing  and  (if run as root) cache
              dropping.  Multiple child processes are spawned to all  share  a  single  file  and
              perform different I/O operations on the same file.

-i N, --io N
              start  N workers continuously calling [sync](https://manpages.ubuntu.com/manpages/bionic/man2/sync.2.html)(2) to commit buffer cache to disk.  This
              can be used in conjunction with the --hdd options.

-d N, --hdd N
              start N workers continually writing, reading  and  removing  temporary  files.  The
              default  mode is to stress test sequential writes and reads.  With the --aggressive
              option enabled without any --hdd-opts options the hdd stressor  will  work  through
              all the --hdd-opt options one by one to cover a range of I/O options.

CPU

   --getrandom N

RAM

stress-ng --vm 1 --vm-bytes 75% --vm-method all

Another interesting project

https://github.com/stressapptest/stressapptest

maethor commented 1 year ago

OK, I finally took the time to look at this. Thank you for you suggestions. @github-benjamin-davy and @da-ekchajzer

I think for now we should be looking for generalist test. I don't want to pinpoint specific use cases like crypto or random generation. And I absolutely want to test IOs.

So I played a little, and this is what I see :

--io alone is nice because it generates 100% business on disk, but no writes or reads.
with --io and --hdd we do not need more that 1 worker. For good measure I will use 2 but states does not seem to change with 16 workers.
--hdd 2 alone seems to generate more writes dans --io 2 --hdd 2
--iomix seems to be able to cause a lot of load so I will be stresstesting it with 1, 2, 4, 8…
--getrandom seems to load cpu_sys instead of cpu_user so I will be using it with 1, 2, 4, 8…
--memrate seems to generate a lot of power consumption in the DRAM section of RAPL so I will be stresstesting it with 1, 2, 4, 8…

What I am missing is a disk read test.

da-ekchajzer commented 1 year ago

Here are some tests that read recursively, which seems to be good for reading test but will only affect some partitions…

--sysfs N
              start N workers that recursively read files from [/sys](file:///sys) (Linux only).  This may cause
              specific kernel drivers to emit messages into the kernel log.

--getdent N
              start N workers that recursively read directories [/proc](file:///proc), [/dev/](file:///dev/), [/tmp](file:///tmp), [/sys](file:///sys) and [/run](file:///run)
              using getdents and getdents64 (Linux only).

--procfs N
              start  N  workers  that  read  files  from  [/proc](file:///proc)  and  recursively read files from
              [/proc/self](file:///proc/self) (Linux only).

Boavizta / Energizta

Default stress tests #19

With stress-ng

I/O

CPU

RAM

Another interesting project