KidsWithTokens / MedSegDiff

Medical Image Segmentation with Diffusion Model
MIT License
976 stars 145 forks source link

Image Size 256 problem/ Evaluation problem #153

Closed qahsinahk closed 5 months ago

qahsinahk commented 5 months ago

when i set the image size to 256 it throws error, i even tried to run on multiple GPUs, single GPUs, but the model is only running on image size 64, with 256 it gives the following error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 23.70 GiB total capacity; 3.57 GiB already allocated; 5.81 MiB free; 3.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

and when i train the model with 64 image size then generate samples and when i try to evaluate, i get the following error. RuntimeError: The size of tensor a (4096) must match the size of tensor b (65536) at non-singleton dimension 0

Furthermore i trained the model for 3 days the model was saved at 650,000, then i stopped training, can someone tell me whether the training will stop itself or do i have to stop it?

Can anypne please help me, i have been working on this for the past one month. I just want some results

WuJunde commented 5 months ago
  1. your GPU memory is not enough
  2. which line to get the error? it's may because you input different image size with training in the evaluation
  3. you need to stop by yourself
qahsinahk commented 5 months ago

I got 30Gbs 3090RTX, what should i do to make it run on 256 image size?

name of the file: /data1/Ishaq/environment/MedSegDiff/data/ISIC/Test_results/0000149_output_ens.jpg name of the file: /data1/Ishaq/environment/MedSegDiff/data/ISIC/ISBI2016_ISIC_Part3B_Test_Data/ISIC_0000149_Segmentation.png Traceback (most recent call last): File "/data1/Ishaq/environment/MedSegDiff/scripts/segmentation_env.py", line 186, in main() File "/data1/Ishaq/environment/MedSegDiff/scripts/segmentation_env.py", line 179, in main temp = eval_seg(pred, gt) File "/data1/Ishaq/environment/MedSegDiff/scripts/segmentation_env.py", line 142, in eval_seg edice += dice_coeff(vpred[:,0,:,:], gt_vmask_p[:,0,:,:]).item() File "/data1/Ishaq/environment/MedSegDiff/scripts/segmentation_env.py", line 93, in dice_coeff s = s + DiceCoeff().forward(c[0], c[1]) File "/data1/Ishaq/environment/MedSegDiff/scripts/segmentation_env.py", line 61, in forward self.inter = torch.sum(input_flat * target_flat) RuntimeError: The size of tensor a (4096) must match the size of tensor b (65536) at non-singleton dimension 0

I get this error in evaluation file in this line

furthermore, when do i stop the model? like i reached 650000 at the most. is that enough?

WuJunde commented 5 months ago
  1. you may need to decrease the batch size to run higher resolution
  2. it seems your mask and input image have different resolutions
qahsinahk commented 5 months ago

i already tried with different batch sizes upto 1, but it is still not working. I don't know how to deal with it.

  1. I think the i am training the model with image size 64 and then the segmentation_env file is processing a file of image size 256, thats where the incompatibility arises