Open maethor opened 1 year ago
Hello @maethor what we did with turbostress was focusing on stressing CPU and RAM but we used different options: --cpu-load that let's define a CPU target load for the test which is nice because usual stress tests are more all or nothing --ipsec-mb, a stress test that performs cryptographic processing using advanced instructions like AVX-512 (test called ipsec later). We wanted to observe the impact of such instructions on power consumption. --vm, a test that specifically performs memory stress methods so that we can observe the impact of memory-intensive workloads --maximize, in this test stress-ng will launch different types of stressors (CPU, cache, memory, file) and set these to the maximum settings allowed in order to get an estimate of a worst-case scenario => there is an even worse setting that caused the VM to crash not sure it's needed because we want to have an estimation of the consumption in normal working conditions anyway
There are specific disk network stressing modes we didn't investigate.
What would be interesting is to see with a wattmeter how these different settings actually change the measurement and what's the main driver (supposedly CPU - Memory for hardware that doesn't include GPUs).
Looking at stress-ng options
--iomix N
start N workers that perform a mix of sequential, random and memory mapped
read/write operations as well as forced sync'ing and (if run as root) cache
dropping. Multiple child processes are spawned to all share a single file and
perform different I/O operations on the same file.
-i N, --io N
start N workers continuously calling [sync](https://manpages.ubuntu.com/manpages/bionic/man2/sync.2.html)(2) to commit buffer cache to disk. This
can be used in conjunction with the --hdd options.
-d N, --hdd N
start N workers continually writing, reading and removing temporary files. The
default mode is to stress test sequential writes and reads. With the --aggressive
option enabled without any --hdd-opts options the hdd stressor will work through
all the --hdd-opt options one by one to cover a range of I/O options.
--getrandom N
stress-ng --vm 1 --vm-bytes 75% --vm-method all
OK, I finally took the time to look at this. Thank you for you suggestions. @github-benjamin-davy and @da-ekchajzer
I think for now we should be looking for generalist test. I don't want to pinpoint specific use cases like crypto or random generation. And I absolutely want to test IOs.
So I played a little, and this is what I see :
--io
alone is nice because it generates 100% business on disk, but no writes or reads.--io
and --hdd
we do not need more that 1 worker. For good measure I will use 2 but states does not seem to change with 16 workers.--hdd 2
alone seems to generate more writes dans --io 2 --hdd 2
--iomix
seems to be able to cause a lot of load so I will be stresstesting it with 1, 2, 4, 8…--getrandom
seems to load cpu_sys instead of cpu_user so I will be using it with 1, 2, 4, 8…--memrate
seems to generate a lot of power consumption in the DRAM section of RAPL so I will be stresstesting it with 1, 2, 4, 8…What I am missing is a disk read test.
Here are some tests that read recursively, which seems to be good for reading test but will only affect some partitions…
--sysfs N
start N workers that recursively read files from [/sys](file:///sys) (Linux only). This may cause
specific kernel drivers to emit messages into the kernel log.
--getdent N
start N workers that recursively read directories [/proc](file:///proc), [/dev/](file:///dev/), [/tmp](file:///tmp), [/sys](file:///sys) and [/run](file:///run)
using getdents and getdents64 (Linux only).
--procfs N
start N workers that read files from [/proc](file:///proc) and recursively read files from
[/proc/self](file:///proc/self) (Linux only).
For the first energizta.sh, my default stress-test is too basic :
We need to check other cases, memory, disk IO, maybe other type of CPU usage.
stress-ng
should be able to manage all of this. Any help and advice would be appreciated.But we also need to limit the number of tests. By default we run each test for one minute. 20 secondes waiting for warmup, and 40 seconds of measures. This can be discussed. But we should be careful that the full test doesn't run for 2 hours. I think we should limit the number of tests to 10 or 15. What do you think?