NeuromatchAcademy / course-content-dl

NMA deep learning course
https://deeplearning.neuromatch.io/
Creative Commons Attribution 4.0 International
740 stars 268 forks source link

Unsupervised and self-supervised learning, Tutorial 1 #905

Closed spirosChv closed 9 months ago

spirosChv commented 1 year ago

Torch code that I ran with no problem on Google Colab (default configuration) a year ago is now failing with an out-of-memory error. Specifically, calling torch.nn.functional.cosine_similarity() seems to require around double the amount of memory it used to, to the point where this fails:

import torch
mat = torch.nn.functional.cosine_similarity(torch.rand((1, 4000, 100)), torch.rand((4000, 1, 100)), dim=2)

as the operation requires more than the amount of RAM available (12.7 GB). If I run the same but on a GPU (15 GB of RAM), I get the following error when it crashes:

OutOfMemoryError: CUDA out of memory. Tried to allocate 5.96 GiB (GPU 0; 14.75 GiB total capacity; 12.05 GiB already allocated; 2.57 GiB free; 12.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

These errors occur with torch==2.0.1+cu118 and torchvision==0.15.2+cu118. If we downgrade to torch==1.11.0+cu102 and torchvision==0.12.0+cu102, the code runs fine.

For more see the Issue report on pytorch github page https://github.com/pytorch/pytorch/issues/104564

spirosChv commented 1 year ago

A PR https://github.com/pytorch/pytorch/pull/104771 has been submitted to pytorch repo. Seems the issue was coming with the new pytorch. No need to change/update the NMA code.

spirosChv commented 9 months ago

Fixed by https://github.com/NeuromatchAcademy/course-content-dl/pull/934