SuperSupermoon / MedViLL

MedViLL official code. (Published IEEE JBHI 2021)
MIT License
89 stars 11 forks source link

Training Error #7

Closed jainnipun11 closed 2 years ago

jainnipun11 commented 2 years ago

Hey! I unzipped the images in the suggested path, but still I keep getting the:

FileNotFoundError: [Errno 2] No such file or directory: '/home/data_storage/mimic-cxr/dataset/image_preprocessing/re_512_3ch/Train/s50328096.jpg'

Can you elaborate why this error is coming?

Thanks.

ttumyche commented 2 years ago

Hi, jainnipun

I guess that error occurred in this line How did you define the suggested path? or you can just change 'fixed_path' to yours.

jainnipun11 commented 2 years ago

Thanks! Resolved the "fixed_path" issue, but now it's giving CUDA out of memory error:

CUDA out of memory. Tried to allocate 314.00 MiB (GPU 0; 15.78 GiB total capacity; 13.97 GiB already allocated; 240.75 MiB free; 14.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What to do? Should I decrease the batch size?

ttumyche commented 2 years ago

Yup, reduce the batch size to fit your GPU VRAM first. If that does not solve the error, let me know again

jainnipun11 commented 2 years ago

Yes! Training started after I reduced the "batch_size" from 36 to 15. Thanks.