Closed dbkinghorn closed 6 months ago
Which kernel is being used? Use the following command to show the kernel version: uname -r
I suggest also running the same command using the --vmstat 1 option to show the memory use every second:
stress-ng --vm 1 --vm-bytes 500g --vm-keep -t 60s --vmstat 1
6.5.0-35-generic
Adding --vmstat 1
stress-ng: info: [3105664] vmstat r b swpd free buff cache si so bi bo in cs us sy id wa st
stress-ng: info: [3105664] vmstat 2 0 8388604 1572061672 339508 3392044 0 0 0 0 407 115 0 0 100 0 0
stress-ng: info: [3105664] vmstat 2 0 8388604 1572042016 339508 3392196 0 0 0 0 367 120 0 0 100 0 0
stress-ng: info: [3105664] vmstat 2 0 8388604 1572022620 339508 3392196 0 0 0 0 402 156 0 0 100 0 0
stress-ng: info: [3105664] vmstat 2 0 8388604 1572002712 339508 3392196 0 0 0 0 648 630 0 0 100 0 0
stress-ng: info: [3105664] vmstat 2 0 8388604 1571983308 339508 3392196 0 0 0 0 331 86 0 0 100 0 0
stress-ng: info: [3105664] vmstat 3 0 8388604 1571962592 339516 3392196 0 0 0 148 478 195 0 0 100 0 0
stress-ng: info: [3105664] vmstat 2 0 8388604 1571943188 339516 3392196 0 0 0 0 410 127 0 0 100 0 0
stress-ng: info: [3105664] vmstat 2 0 8388604 1571922812 339516 3392196 0 0 0 0 1188 876 0 0 100 0 0
I suspect that it takes a few seconds while for the mmap/madvise on such a large memory mapping to set up, and then the vm stressor takes a while to walk through all the mapped pages. Due to the copy-on-write semantics of mmap, the physical pages are only being allocated as they are being touched and with some of the vm exercising routines it take a while to walk through all the pages. I suggest running the stressor for several minutes with --vmstat 10 to see how memory is being consumed.
Now that this has been explained, is this still an issue? Note that I pushed a commit https://github.com/ColinIanKing/stress-ng/commit/d15a765ed8d852c2a8f0f4fb9fcfcec682dbe4a5 that can populate all the required physical pages after the mmap'ing has occurred, just use the --page-in option to enable this for the vm stressor.
That's great! I do often have systems with crazy amounts of memory to stress test. Thanks for your quick response and help. I had forgotten to close, will do that now. Best wishes!
When I run this,
stress-ng --vm 1 --vm-bytes 500g --vm-keep -t 60s
For the entire run top shows
This is on a dual 65-core EPYC system with 1.5TB mem running Ubuntu 22.04
Same thing when running with
stress-ng --sequential 128 --class cpu -t 20s --metrics-brief --times
top always shows %MEM as 0
Any ideas of what is going on?