Closed ducnzo closed 2 months ago
Hi, thanks for your interest in our work!
1) My first reaction to training from scratch - 2500 LR/HR pairs is likely not enough data and 260k iterations is not enough training time. Most of my experiments were trained on ~1.3 million LR/HR pairs and I trained for over one million iterations. I did see similar artifacts, to the results you are showing, in the first half of training.
2) Could you try this finetuning experiment without CLIPLoss? The net_g and net_d weights I emailed you did not utilize CLIPLoss. Also, it would be interesting to see loss plots if you have them. I successfully finetuned the S2NAIP model on WorldStrat and got good results, so transferring to a different high-res imagery source should hopefully work after some finetuning of hyperparameters.
The baseline ESRGAN model with 8 input Sentinel-2 images, a batch size of 8, and 1 A6000 GPU was trained from scratch in ~12 days. One could train on multiple gpus to speed this up.
I'm having a similar issue.
@duyquang392 Hi, could you provide more information?
I'm building a dataset where I take 1 Sentinel image. My question is: Can I make timestamp images for LR images using my method, or is it necessary to use 8 images in varying time ranges? (at the same location). And why you use different time range for the data?
I'm building a dataset where I take 1 Sentinel image. My question is: Can I make timestamp images for LR images using my method, or is it necessary to use 8 images in varying time ranges? (at the same location). And why you use different time range for the data?
You can use a model that was only trained on 1 image, but I saw the best balance between performance and efficiency when the model took in 8 images. I saw more hallucinations and blurrier quality in generated images with my 1S2 model.
In general, the idea with using multiple images from the same location but different times - satellite imagery at one location will have sub-pixel differences between each image that is captured, and by providing multiple different images to the model, it has the opportunity to piece together different information into the final output image.
If you only have 1 Sentinel image in your dataset, you should try 1) the 1S2 model with that image and 2) the 8S2 model with that image repeated 8 times. I am not sure which would generate better imagery.
For the 8S2 model, the 8 images were chosen within 1.5 months of the target high-res image, and I tried to filter out images that were overly cloudy and partially invalid (not captured). There are seasonal changes between images when we do it in this way, but we found there to be very few large, structural changes (like buildings being built). One could further reduce the time range to within 1 month if worried about too much change occurring.
Hi team members and everyone, First of all, thanks for interesting git. I have some problems that are detailed below:
I have some questions.
Why does training from scratch provide such poor results?
Why does training with pre-trained weights seem to go worse as you trainmore?
Could you perhaps provide me some recommendations on how to address the aforesaid problem?
And the last question: What is your hardware configuration to successfully train the S2-NAIP model and how long is the training time?
Thank you all for your supports