Closed yunseok624 closed 7 months ago
How long does 1 train iteration take? You might want to validate less often or on a smaller set, so that takes less time. This seems quite slow.
I trained an esrgan, with 4 images as input, for 1million iterations, on S2-NAIP on 1 A6000 gpu in ~12 days.
What do you mean by 4 images as input?
It takes 2min for every 100 epoches. I'm using small val_sets. I'm displaying results in wandb every 200 iterations to see check whether I should continue or stop the training. Maybe I should validate every every 5000 iterations like you put in the .yml file and display the result in wandb for every 200 iterations
I mean n_s2_images=4, but that shouldn't affect time per iteration too much.
I think validating every 5000 iterations is a good idea, validation takes a long time. I think it's set up to run validation with a batch size of 1, so maybe editing the code to work with a larger batch size would speed that up a bit. How long does it take to display the result in wandb for 1 iteration?
It takes minimum 25min, but like you said I think it's the problem with validation.
For proba-v dataset, I validate every 100 iterations, but it trains very quickly and displays on wandb
Okay, I see. Try validating less during training (you could always run the evaluate script in a separate process while training is going), and see if speeds are ok.
Hi, can you tell me how long it took for you to train S2-NAIP on esrgan? I'm started training with OSM discriminator & clip loss, however I'm still in 200th iteration for 2.5.