Open AAAeray opened 4 years ago
Well, there are some known issues regarding the multi-gpu training since the last couple of updates of PyTorch. I will have to investigate this, but it's quite difficult given my schedule right now. The training instability in this resumed training could be because of improper loading of state/weights.
There are 8k+ 256*256 images in my datasets, I set batchsize=32 and trained with 4 12GB GPUs, but it's not as good as I set batchsize = 4 and trained with one GPU,neither the training speed nor the image quality. Why a smaller batch size results better results? Is there a best batchsize? Beside,I set batchsize=4, when epoch>70, g-loss went up obviously,and qualities of generated images went worse.Why did this happen? (Training process was interrupted when epoch = 54, I reloaded weight files and optimer states from epoch 53 ) Thanks!