eth-cscs / pyfr

pyfr@cscs (https://github.com/vincentlab/PyFR)
0 stars 0 forks source link

Is gpu memory usage an issue ? #5

Open jgphpc opened 8 years ago

jgphpc commented 8 years ago

Arvind says that the design of the runs is so as to use up the memory of the cards almost completely, close to ~90%, which may explain the amount of errors we are facing.

The gpu memory used on 4 nodes (calling nvidia-smi) is close to the peak (~84%):

eff_gpumemused

Available memory is stated memory of NVIDIA reduced by about 12.5% due to ecc. Did you take into account this reduction due to ecc ?

Arvind: On my workstation, 5268MiB is being used per rank for this job. This must be approximately consistent across any number of jobs that my script generates: the memory of a card is roughly 6GB, with ECC this will appear as 5.4GB => this job uses roughly 96% of the GPU memory.

jgphpc commented 8 years ago

available memory reduced by about 12.5% due to ECC

@pmessmer Is the total memory usage reported by nvidia-smi (~5.7GB) or (~5.7GB-12.5%) ?

 Product Name                    : Tesla K20X
    Memory Usage
        Total                       : 5759 MB
        Used                        : 31 MB
        Free                        : 5728 MB
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
pmessmer commented 8 years ago

The total memory available to the application should be 5.7GB.

However, keep in mind that this doesn’t mean you can actually store 5.7GB of payload data, as the allocator will have some granularity as well (I think the minimal block size is 512B).

From: jgp [mailto:notifications@github.com] Sent: Wednesday, August 31, 2016 7:19 PM To: eth-cscs/pyfr Cc: Peter Messmer; Mention Subject: Re: [eth-cscs/pyfr] Is gpu memory usage an issue ? (#5)

available memory reduced by about 12.5% due to ECC

@pmessmerhttps://github.com/pmessmer Is the total memory usage reported by nvidia-smi (~5.7GB) or (~5.7GB-12.5%) ?

Product Name : Tesla K20X

Memory Usage

    Total                       : 5759 MB

    Used                        : 31 MB

    Free                        : 5728 MB

Ecc Mode

    Current                     : Enabled

    Pending                     : Enabled

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/eth-cscs/pyfr/issues/5#issuecomment-243835641, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFNGQ7UMAPALZegYZ5JQ-j4a7cpxzM1Hks5qlbd4gaJpZM4JsFY3.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.