allenai / satlas-super-resolution

Apache License 2.0
220 stars 24 forks source link

Training time #25

Closed yunseok624 closed 7 months ago

yunseok624 commented 7 months ago

image

Hi, can you tell me how long it took for you to train S2-NAIP on esrgan? I'm started training with OSM discriminator & clip loss, however I'm still in 200th iteration for 2.5.

piperwolters commented 7 months ago

How long does 1 train iteration take? You might want to validate less often or on a smaller set, so that takes less time. This seems quite slow.

I trained an esrgan, with 4 images as input, for 1million iterations, on S2-NAIP on 1 A6000 gpu in ~12 days.

yunseok624 commented 7 months ago

What do you mean by 4 images as input?

It takes 2min for every 100 epoches. I'm using small val_sets. I'm displaying results in wandb every 200 iterations to see check whether I should continue or stop the training. Maybe I should validate every every 5000 iterations like you put in the .yml file and display the result in wandb for every 200 iterations

piperwolters commented 7 months ago

I mean n_s2_images=4, but that shouldn't affect time per iteration too much.

I think validating every 5000 iterations is a good idea, validation takes a long time. I think it's set up to run validation with a batch size of 1, so maybe editing the code to work with a larger batch size would speed that up a bit. How long does it take to display the result in wandb for 1 iteration?

yunseok624 commented 7 months ago

It takes minimum 25min, but like you said I think it's the problem with validation.

For proba-v dataset, I validate every 100 iterations, but it trains very quickly and displays on wandb

piperwolters commented 7 months ago

Okay, I see. Try validating less during training (you could always run the evaluate script in a separate process while training is going), and see if speeds are ok.