Closed simon-donike closed 6 months ago
Hi Simon, thank you for your words and interest in this project!
The super-res outputs on Satlas use as many S2 images as available. So for each location, the pipeline checks how many S2 images are available, and then uses the pre-trained model that takes in that number of input images (or close to that number). I found that having at least 8 S2 images vastly improves performance compared to 1, 2, or 4 S2 images.
If you're focused on SISR, I imagine this model won't be the best out there, considering I ran experiments with 1 S2 image mostly out of curiosity, and didn't spend much time optimizing that specific model. I spent more time optimizing the models that took in 16 S2 images.
It's also worth noting that the output visualized on Satlas for this specific location might not be very accurate either. It's been a challenge to get accurate and realistic outputs for places outside of the US and EU.
Alright, thanks!
Yeah, I got the same results in the US with sentinel-2, perhaps the available checkpoints are not trained to the last epoch.
@simon-donike @amirafshari Reopening this issue because a bug was recently pointed out to me in the dataset - a reshaping error was leading to the model seeing all bands of not all images (see issue 30).
This bug was introduced when I made a new, cleaned up repo from my research repo, and switched to torchvision.io.read_image from skimage.io.read_im. I pushed a fix to this repo, and have been retraining the {1,2,4,8,16}_S2 models. I will update the weights next week when training is complete and if outputs look better.
Following up on this. @simon-donike @amirafshari
The bug was essentially feeding the model R,G,B bands from different timestamped images. So the first image in a batch would have R band from T0, G band from T1, and B band from T2. See in this example that the B band contains clouds but R and G do not.
With the fix, the bands are all from the same timestamp.
I am currently training the {1,2,4,8,16}_S2 models, and they are at 800k iterations. I will continue training to ~1.5mil iterations. But I have uploaded the 800k iterations checkpoint for the generator, if anyone would like to test it out and/or confirm the bug is fixed. I will upload the final checkpoints once training is complete.
Thanks for the update @piperwolters. If I understand correctly, this bug should not affect SISR?
@simon-donike I think even SISR would be affected, since the dataset loads a file containing a time series of Sentinel-2 images within 3 months of the corresponding NAIP image, and then the lines that contained a bug are extracted X of the Sentinel-2 images. So when X=1, it is still pulling R,G,B bands from different Sentinel-2 images.
I have gotten feedback from another user that this 800.pth checkpoint already looks better.
Model weights for {1,2,4,8,16}-S2-images models have been updated. The bug in the code is fixed, so hopefully everyone sees improved outputs.
Please let me know if you run into further issues!
Hi Piper, thanks for the updates and the new checkpoint.
Here are my results for SISR with the new weights, in this case the outskirts of Tampere in Finland (61.472, 23.841). There does not seem to be a substential change in SR quality. Dou you think this is still related to the SISR-MISR differences, or does it look like there is a more basic error?
While I still think SISR outputs will not look as good as the Satlas map, since almost all of those outputs used 8+ images as input, your image makes it looks like there could be a normalization bug. Are you using L1C imagery, preprocessed as described here?
I double checked and yes - that's exactly the input. I experimented with TCI-like inputs and with L1C data, preprocessed as described. To valdiate, I stacked the image 8 times and used the 8-image MISR checkpoint. Does this come closer to your results, or is there also another issue in your opinion?
Do you have any other ideas, or recently ran SISR on your own end? Would be awesome if we could get it to work!
@simon-donike Could you tell me the web-mercator tile that this input/output is from? I will run it on my end and make sure I get the same result. But yes, I imagine the multi-image super resolution outputs will look better than the single-image - it is interesting that just repeating one image 8 times looks so much better than using one image.
@piperwolters Sorry for the late response, I was away for a week. IIRC this image is in Tampere, Finland at 61.5028, 23.7136 or at Web Mercator coordinates 2642463, 8740441. Thanks a lot for having a look yourself!
Hi Piper, first of all thanks for providing this repository and the weights - the structure and codebase made re-implementation very intuitive. The transparency and reproducability that you provide is what the field of remote sensing SR desperately needs.
The Question:
I am running SISR on S2 images and get good results in rural and suburban areas. As soon as more densly built-up areas are present in a patch though, the model strongly hallucinates and deviates very far from both the ground truth and the input imagery. I'm using the
esrgan_1S2.pth
checkpoint. Cross-checking my results with the same area on the Satlas SR map shows a very large difference in quality.Do you have any idea what might cause this, or have you seen similar results before?
Thanks in advance Simon
Images
This example is in Buenos Aires, Argentina
1. Satlas Screenshot
2. My Results
3. Input S2 Image