Open jgphpc opened 8 years ago
available memory reduced by about 12.5% due to ECC
@pmessmer Is the total memory usage reported by nvidia-smi (~5.7GB) or (~5.7GB-12.5%) ?
Product Name : Tesla K20X
Memory Usage
Total : 5759 MB
Used : 31 MB
Free : 5728 MB
Ecc Mode
Current : Enabled
Pending : Enabled
The total memory available to the application should be 5.7GB.
However, keep in mind that this doesn’t mean you can actually store 5.7GB of payload data, as the allocator will have some granularity as well (I think the minimal block size is 512B).
From: jgp [mailto:notifications@github.com] Sent: Wednesday, August 31, 2016 7:19 PM To: eth-cscs/pyfr Cc: Peter Messmer; Mention Subject: Re: [eth-cscs/pyfr] Is gpu memory usage an issue ? (#5)
available memory reduced by about 12.5% due to ECC
@pmessmerhttps://github.com/pmessmer Is the total memory usage reported by nvidia-smi (~5.7GB) or (~5.7GB-12.5%) ?
Product Name : Tesla K20X
Memory Usage
Total : 5759 MB
Used : 31 MB
Free : 5728 MB
Ecc Mode
Current : Enabled
Pending : Enabled
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/eth-cscs/pyfr/issues/5#issuecomment-243835641, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFNGQ7UMAPALZegYZ5JQ-j4a7cpxzM1Hks5qlbd4gaJpZM4JsFY3.
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by
Arvind says that the design of the runs is so as to use up the memory of the cards almost completely, close to ~90%, which may explain the amount of errors we are facing.
The gpu memory used on 4 nodes (calling nvidia-smi) is close to the peak (~84%):
Available memory is stated memory of NVIDIA reduced by about 12.5% due to ecc. Did you take into account this reduction due to ecc ?
Arvind: On my workstation, 5268MiB is being used per rank for this job. This must be approximately consistent across any number of jobs that my script generates: the memory of a card is roughly 6GB, with ECC this will appear as 5.4GB => this job uses roughly 96% of the GPU memory.