ColinIanKing / stress-ng

This is the stress-ng upstream project git repository. stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
https://github.com/ColinIanKing/stress-ng
GNU General Public License v2.0
1.82k stars 290 forks source link

%MEM remains 0 #396

Closed dbkinghorn closed 6 months ago

dbkinghorn commented 6 months ago

When I run this, stress-ng --vm 1 --vm-bytes 500g --vm-keep -t 60s

For the entire run top shows

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                 
3105652 kinghorn  20   0  500.1g      0      0 R 100.0   0.0   0:18.15 stress-ng    

This is on a dual 65-core EPYC system with 1.5TB mem running Ubuntu 22.04

Same thing when running with stress-ng --sequential 128 --class cpu -t 20s --metrics-brief --times

top always shows %MEM as 0

    PID         USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                 
3103656 kinghorn  20   0   87292      0      0 R 100.0   0.0   0:04.20 stress-ng                                                                               
3103657 kinghorn  20   0   87292      0      0 R 100.0   0.0   0:04.20 stress-ng                                                                               
3103658 kinghorn  20   0   87292      0      0 R 100.0   0.0   0:04.20 stress-ng                                                                               
3103659 kinghorn  20   0   87292      0      0 R 100.0   0.0   0:04.20 stress-ng  
...

Any ideas of what is going on?

ColinIanKing commented 6 months ago

Which kernel is being used? Use the following command to show the kernel version: uname -r

I suggest also running the same command using the --vmstat 1 option to show the memory use every second:

stress-ng --vm 1 --vm-bytes 500g --vm-keep -t 60s --vmstat 1

dbkinghorn commented 6 months ago

6.5.0-35-generic

Adding --vmstat 1

stress-ng: info:  [3105664] vmstat  r  b      swpd      free      buff     cache   si   so     bi     bo   in   cs us sy id wa st
stress-ng: info:  [3105664] vmstat  2  0   8388604 1572061672    339508   3392044    0    0      0      0  407  115  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  2  0   8388604 1572042016    339508   3392196    0    0      0      0  367  120  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  2  0   8388604 1572022620    339508   3392196    0    0      0      0  402  156  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  2  0   8388604 1572002712    339508   3392196    0    0      0      0  648  630  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  2  0   8388604 1571983308    339508   3392196    0    0      0      0  331   86  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  3  0   8388604 1571962592    339516   3392196    0    0      0    148  478  195  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  2  0   8388604 1571943188    339516   3392196    0    0      0      0  410  127  0  0 100  0  0
stress-ng: info:  [3105664] vmstat  2  0   8388604 1571922812    339516   3392196    0    0      0      0 1188  876  0  0 100  0  0
ColinIanKing commented 6 months ago

I suspect that it takes a few seconds while for the mmap/madvise on such a large memory mapping to set up, and then the vm stressor takes a while to walk through all the mapped pages. Due to the copy-on-write semantics of mmap, the physical pages are only being allocated as they are being touched and with some of the vm exercising routines it take a while to walk through all the pages. I suggest running the stressor for several minutes with --vmstat 10 to see how memory is being consumed.

ColinIanKing commented 6 months ago

Now that this has been explained, is this still an issue? Note that I pushed a commit https://github.com/ColinIanKing/stress-ng/commit/d15a765ed8d852c2a8f0f4fb9fcfcec682dbe4a5 that can populate all the required physical pages after the mmap'ing has occurred, just use the --page-in option to enable this for the vm stressor.

dbkinghorn commented 6 months ago

That's great! I do often have systems with crazy amounts of memory to stress test. Thanks for your quick response and help. I had forgotten to close, will do that now. Best wishes!