gjbex / monitor

monitor logs cpu and memory usage of a running application
GNU General Public License v3.0
3 stars 0 forks source link

Relationship between "size (kb)" and "%mem" #4

Closed MaximeVdB closed 3 years ago

MaximeVdB commented 3 years ago

Hello Geert Jan,

Though not explicitly stated in the README, one somewhat expects that the "size (kb)" and "%mem" columns would be related as

size (kb) = total_memory_in_kb * %mem / 100

This relation is also implied in this document around page 105: https://hpcugent.github.io/vsc_user_docs/pdf/intro-HPC-linux-leuven.pdf

However, this does not always seems to hold. At least, with this little Python script,

from time import sleep
import numpy as np

x = np.ones((10000, 10000), dtype=np.float64)  #  array size: 800 MB
sleep(300)

I get the the following output, on a machine with 192 GB memory:

~$ nohup python test.py &
[1] 23173

~$ monitor -p 23173
time  (s)    size (kb)     %mem    %cpu
5            1975180       0.4     1.2
10           1975180       0.4     1
15           1975180       0.4     0.9

Based on the relationship above, one would expect a "size (kb)" value of about 192*0.004*1000*1000 = 768000. The listed value of 1975180, however, is 2.5 times higher than this.

Could it perhaps be related to the difference between the "virtual' and 'resident' memory use? I.e. that the 'size (kb)' column is referring to virtual memory and the '%mem' column is based on the resident memory usage? At least, that seems to be consistent with the output from top:

~$ top -p 23173
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23173 username  20   0 2144320 802588   6788 S   0.0  0.4   0:00.38 python
gjbex commented 3 years ago

The size is what ps would return, as per its man page:

size        SIZE      approximate amount of swap space that would be
                             required if the process were to dirty all
                             writable pages and then be swapped out.  This
                             number is very rough!
MaximeVdB commented 3 years ago

Thanks, so then I take it that the values of the "size (kb)" and "%mem" columns will only follow the above relationship in certain situations, and not in general? And that "%mem" is (usually) more relevant than "size (kb)" in assessing the actual memory usage (to e.g. avoid out-of-memory errors)? If so, then I think this should be made clear in the documentation of monitor.

The following script provides an extreme example, where '%mem' is zero whereas the 'size' exceeds the 192 GB of available memory on the machine:

from time import sleep
import numpy as np

num = 100000  # array sizes of around 80 GB
x = np.empty((num, num), dtype=np.float64)
y = np.empty((num, num), dtype=np.float64)
z = np.empty((num, num), dtype=np.float64)
sleep(300)
gjbex commented 3 years ago

Well, not quite. The example that you mention is precisely something where size may come in handy. The memory is reserved and could be filled very quickly, perhaps faster than monitor is going to pick up on (depending on delta). So the application crashes while %mem is still reasonable, but in fact the memory is exhausted between sample points. Looking at size tells you that this could indeed have happened.

On Fri, Nov 20, 2020 at 11:53 AM MaximeVdB notifications@github.com wrote:

Thanks, so then I take it that the values of the "size (kb)" and "%mem" columns will only follow the above relationship in certain situations, and not in general? And that "%mem" is (usually) more relevant than "size (kb)" in assessing the actual memory usage (to e.g. avoid out-of-memory errors)? If so, then I think this should be made clear in the documentation of monitor.

The following script provides an extreme example, where '%mem' is zero whereas the 'size' exceeds the 192 GB of available memory on the machine:

from time import sleepimport numpy as np num = 100000 # array sizes of around 80 GBx = np.empty((num, num), dtype=np.float64)y = np.empty((num, num), dtype=np.float64)z = np.empty((num, num), dtype=np.float64)sleep(300)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gjbex/monitor/issues/4#issuecomment-731097612, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEUGOEZVGTPSROXSIPDC53SQZDDPANCNFSM4T3RXABA .

--

dr. Geert Jan Bex HPC consultant/analist Docent - Lecturer Data Science Institute

T +32(0)11 26 82 31

www.uhasselt.be Universiteit Hasselt - Campus Diepenbeek Agoralaan Gebouw D - B-3590 Diepenbeek Kantoor D250a

MaximeVdB commented 3 years ago

I did not mean to say that the 'size (kb)' value is not useful -- it definitely is! It's just that people might think it refers to the amount of (physical) memory in actual use by the process, which is not true (or only in specific cases). For this information, one should rather look to the '%mem' value instead.

(For the record: the 'extreme' example above does not crash, since there is not necessarily a problem when the virtual memory usage exceeds the actual amount of available physical memory. But if it would crash (when starting to actually use physical memory for those arrays), then the "size (kb)" value is indeed helpful to find out what happened.)

gjbex commented 3 years ago

I've adapted the README a bit (development branch only). Can you have a look whether that would clarify the issues you had?

Thanks.

MaximeVdB commented 3 years ago

Looks good, thanks!