UOB-AI / UOB-AI.github.io

A repository to host our documentations website.
https://UOB-AI.github.io
1 stars 3 forks source link

ran out of memory #20

Closed Amalsalem closed 1 year ago

Amalsalem commented 1 year ago

I get this message : 2023-04-23 16:22:59.602531: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 937.50MiB (rounded to 983040000)requested by op _EagerConst

asubah commented 1 year ago

Can you please add more details, like which node is this? T4 or A100 GPU? What are you trying to do?

Amalsalem commented 1 year ago

i am tring to run this model

y_pred = model.train_second_phase_medmnist(x=x, y=y, kappa=1, n_clusters=10, maxiter=500, batch_size=40, tol=0.0, validate_interval=140, show_interval=200, save_interval=2800 , save_dir=save_dir, aug_train=True)

and i am using GPU , Node = 2

On Wed, Apr 26, 2023 at 11:47 AM Abdulla Subah @.***> wrote:

Can you please add more details, like which node is this? T4 or A100 GPU? What are you trying to do?

— Reply to this email directly, view it on GitHub https://github.com/UOB-AI/UOB-AI.github.io/issues/20#issuecomment-1523023588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY3B76EN6222PL6Q3C53M6LXDDOKVANCNFSM6AAAAAAXMDE3FM . You are receiving this because you authored the thread.Message ID: @.***>

Amalsalem commented 1 year ago

but this message is very clear related to memory . below is the available space :

[20015279@master ~]$ df -kh ~ Filesystem Size Used Avail Use% Mounted on 10.240.240.3:/ifs/data/adhari/zone1/nfs 5.0G 4.9G 108M 98% /home/nfs

Amalsalem commented 1 year ago

if we solve the memory issue, i believe the other issues like the copying for the encoder function will be solved.

and the other mgs are warning, so i believe will not stop running the code.

asubah commented 1 year ago

The memory mentioned in the error message is the GPU RAM not the disk memory. Make sure that the GPU RAM is not full by running !nvidia-smi in the notebook cell.