Closed Oshwiciqwq closed 4 months ago
Hi thanks for informing the issue. I am currently at conference so I will take a look as soon as possible and get back to you. Sorry for the inconvenience
I implemented the current dataloader to load all the labels and images once at initialization to improve I/O speed. This will in turn load all of them to your RAM.
When you are using more than 1 GPU, each GPU will spawn a new process and the cache are not shared between processes. So, if one process uses 100GB of RAM, then running on 4 GPUs will require ~400GB of RAM.
This behaviour can be changed in the data.py. You can disable the _preload_dataset and instead load each sample on the fly in the __getitem__
Thanks for your great work. I tried to use the code to finetune on a dataset of 100k images, but got an error. Python error message:
And I checked the reason was out of memory:
It happened after caching images and before training, the log stopped at:
However, it worked well when training on 1 GPU on first 2 epochs, and got the same error on 3rd epoch. During training, the process used about 20% memory at most times, but sometimes it raised to 40% or more. My machine has 500GB memory in total. I wonder if the code has some memory leak bugs, or if the RAM of my machine is insufficient. Sorry to bother you. I am new to machine learning and would really appreciate your help.