so I made some experiments changing the lr and I found out something strange.
The trainings seems to improve when the lr is as low as 1e-7
The loss decreases steadly and the sample images are more consistent.
Is it normal? Am I doing something wrong?
Also I'm about to train with an A100, 50k training set, and I'm not sure about the batch size and how can i finetune hyperparameters to take advantage of the entire gpu.
so I made some experiments changing the lr and I found out something strange. The trainings seems to improve when the lr is as low as 1e-7 The loss decreases steadly and the sample images are more consistent. Is it normal? Am I doing something wrong?
setup: gpu rtx 4090 48vcpu, 124 gb ram batch size: 10 lr_scheduler: cosine lr: 1.2e-7 dataset size: 5k captions type: tags
Also I'm about to train with an A100, 50k training set, and I'm not sure about the batch size and how can i finetune hyperparameters to take advantage of the entire gpu.