Closed qahsinahk closed 5 months ago
I got 30Gbs 3090RTX, what should i do to make it run on 256 image size?
name of the file: /data1/Ishaq/environment/MedSegDiff/data/ISIC/Test_results/0000149_output_ens.jpg
name of the file: /data1/Ishaq/environment/MedSegDiff/data/ISIC/ISBI2016_ISIC_Part3B_Test_Data/ISIC_0000149_Segmentation.png
Traceback (most recent call last):
File "/data1/Ishaq/environment/MedSegDiff/scripts/segmentation_env.py", line 186, in
I get this error in evaluation file in this line
furthermore, when do i stop the model? like i reached 650000 at the most. is that enough?
i already tried with different batch sizes upto 1, but it is still not working. I don't know how to deal with it.
when i set the image size to 256 it throws error, i even tried to run on multiple GPUs, single GPUs, but the model is only running on image size 64, with 256 it gives the following error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 23.70 GiB total capacity; 3.57 GiB already allocated; 5.81 MiB free; 3.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
and when i train the model with 64 image size then generate samples and when i try to evaluate, i get the following error. RuntimeError: The size of tensor a (4096) must match the size of tensor b (65536) at non-singleton dimension 0
Furthermore i trained the model for 3 days the model was saved at 650,000, then i stopped training, can someone tell me whether the training will stop itself or do i have to stop it?
Can anypne please help me, i have been working on this for the past one month. I just want some results