Closed Amalsalem closed 1 year ago
Can you please add more details, like which node is this? T4 or A100 GPU? What are you trying to do?
i am tring to run this model
y_pred = model.train_second_phase_medmnist(x=x, y=y, kappa=1, n_clusters=10, maxiter=500, batch_size=40, tol=0.0, validate_interval=140, show_interval=200, save_interval=2800 , save_dir=save_dir, aug_train=True)
and i am using GPU , Node = 2
On Wed, Apr 26, 2023 at 11:47 AM Abdulla Subah @.***> wrote:
Can you please add more details, like which node is this? T4 or A100 GPU? What are you trying to do?
— Reply to this email directly, view it on GitHub https://github.com/UOB-AI/UOB-AI.github.io/issues/20#issuecomment-1523023588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY3B76EN6222PL6Q3C53M6LXDDOKVANCNFSM6AAAAAAXMDE3FM . You are receiving this because you authored the thread.Message ID: @.***>
but this message is very clear related to memory . below is the available space :
[20015279@master ~]$ df -kh ~ Filesystem Size Used Avail Use% Mounted on 10.240.240.3:/ifs/data/adhari/zone1/nfs 5.0G 4.9G 108M 98% /home/nfs
if we solve the memory issue, i believe the other issues like the copying for the encoder function will be solved.
and the other mgs are warning, so i believe will not stop running the code.
The memory mentioned in the error message is the GPU RAM not the disk memory.
Make sure that the GPU RAM is not full by running !nvidia-smi
in the notebook cell.
I get this message : 2023-04-23 16:22:59.602531: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 937.50MiB (rounded to 983040000)requested by op _EagerConst