Closed GlebSBrykin closed 3 years ago
I would go for the second solution since it is easier to implement (pytorch makes aggregating model updates quite easy, you just need to call backward()
multiple times before each update).
What about option 1? It will save more than 1 GB of video memory, i.e. in my case exactly 1/3 of the total VRAM.
I think that this is significantly trickier to implement. It requires doing a separate forward pass for linear layer and another for the rest of the network, and them combining the backward passes from each part to perform a model update. Seems much more error prone.
I decided to start training something easy. For example, SqueezeNet. When trying to start the process on places365, I encountered a problem: the program swears at the structure of the dataset. It seems that the downloaded and unpacked directories need to be redistributed somehow, but how?
The directory structure should mirror that of the pytorch-style ImageNet. If I remember correctly there are options available on the Places website that have a more friendly directory structure.
Well, I will try to do so.
Closing this for now, feel free to open another issue with any further questions!
Once again, greetings to all present! All this time, I have been studying the capabilities of the ready-made robust resnet50, presented by the authors of this work, and compared it with the classic resnet50. The features of the robust model are fantastic compared to the regular model! In addition, I was thinking about how to train VGG19 to make it even more robust in my PC environment. So, there is 3 GB of VRAM and 8 GB of RAM. The ideas so far are the following in addition to what the authors suggested to me in the last Issue: