Closed Daksitha closed 1 year ago
Thank you for the question. Actually yes this is the expected behavior. Usually with the vq side of training, the loss increases before it decreases and saturates. If you’re changing the hyperparameters I find that actually a smaller embedding size and network size with a weight decay schedule usually results in more stable training.
Hi @evonneng, I am training the vqgan with given trevor data and my own data, is this a usual behaviour of the loss curve?