allenai / satlas-super-resolution

Apache License 2.0
190 stars 24 forks source link

Inference Results - Strong Hallucinations in Urban Areas #27

Closed simon-donike closed 2 months ago

simon-donike commented 3 months ago

Hi Piper, first of all thanks for providing this repository and the weights - the structure and codebase made re-implementation very intuitive. The transparency and reproducability that you provide is what the field of remote sensing SR desperately needs.

The Question:
I am running SISR on S2 images and get good results in rural and suburban areas. As soon as more densly built-up areas are present in a patch though, the model strongly hallucinates and deviates very far from both the ground truth and the input imagery. I'm using the esrgan_1S2.pth checkpoint. Cross-checking my results with the same area on the Satlas SR map shows a very large difference in quality.
Do you have any idea what might cause this, or have you seen similar results before?

Thanks in advance Simon

Images

This example is in Buenos Aires, Argentina

1. Satlas Screenshot

Bildschirmfoto 2024-04-23 um 11 49 40

2. My Results

Bildschirmfoto 2024-04-23 um 11 37 48

3. Input S2 Image

Bildschirmfoto 2024-04-23 um 11 37 41

piperwolters commented 3 months ago

Hi Simon, thank you for your words and interest in this project!

The super-res outputs on Satlas use as many S2 images as available. So for each location, the pipeline checks how many S2 images are available, and then uses the pre-trained model that takes in that number of input images (or close to that number). I found that having at least 8 S2 images vastly improves performance compared to 1, 2, or 4 S2 images.

If you're focused on SISR, I imagine this model won't be the best out there, considering I ran experiments with 1 S2 image mostly out of curiosity, and didn't spend much time optimizing that specific model. I spent more time optimizing the models that took in 16 S2 images.

It's also worth noting that the output visualized on Satlas for this specific location might not be very accurate either. It's been a challenge to get accurate and realistic outputs for places outside of the US and EU.

simon-donike commented 3 months ago

Alright, thanks!

amirafshari commented 3 months ago

Yeah, I got the same results in the US with sentinel-2, perhaps the available checkpoints are not trained to the last epoch.

piperwolters commented 3 months ago

@simon-donike @amirafshari Reopening this issue because a bug was recently pointed out to me in the dataset - a reshaping error was leading to the model seeing all bands of not all images (see issue 30).

This bug was introduced when I made a new, cleaned up repo from my research repo, and switched to torchvision.io.read_image from skimage.io.read_im. I pushed a fix to this repo, and have been retraining the {1,2,4,8,16}_S2 models. I will update the weights next week when training is complete and if outputs look better.

piperwolters commented 3 months ago

Following up on this. @simon-donike @amirafshari

The bug was essentially feeding the model R,G,B bands from different timestamped images. So the first image in a batch would have R band from T0, G band from T1, and B band from T2. See in this example that the B band contains clouds but R and G do not. old_band0 old_band1 old_band2

With the fix, the bands are all from the same timestamp. new_band0 new_band1 new_band2

I am currently training the {1,2,4,8,16}_S2 models, and they are at 800k iterations. I will continue training to ~1.5mil iterations. But I have uploaded the 800k iterations checkpoint for the generator, if anyone would like to test it out and/or confirm the bug is fixed. I will upload the final checkpoints once training is complete.

simon-donike commented 3 months ago

Thanks for the update @piperwolters. If I understand correctly, this bug should not affect SISR?

piperwolters commented 3 months ago

@simon-donike I think even SISR would be affected, since the dataset loads a file containing a time series of Sentinel-2 images within 3 months of the corresponding NAIP image, and then the lines that contained a bug are extracted X of the Sentinel-2 images. So when X=1, it is still pulling R,G,B bands from different Sentinel-2 images.

I have gotten feedback from another user that this 800.pth checkpoint already looks better.

piperwolters commented 2 months ago

Model weights for {1,2,4,8,16}-S2-images models have been updated. The bug in the code is fixed, so hopefully everyone sees improved outputs.

Please let me know if you run into further issues!

simon-donike commented 2 months ago

Hi Piper, thanks for the updates and the new checkpoint.
Here are my results for SISR with the new weights, in this case the outskirts of Tampere in Finland (61.472, 23.841). There does not seem to be a substential change in SR quality. Dou you think this is still related to the SISR-MISR differences, or does it look like there is a more basic error?

Sen-2

S2

SISR Results

SR

Satlas Screenshot

Satlas

piperwolters commented 2 months ago

While I still think SISR outputs will not look as good as the Satlas map, since almost all of those outputs used 8+ images as input, your image makes it looks like there could be a normalization bug. Are you using L1C imagery, preprocessed as described here?

simon-donike commented 2 months ago

I double checked and yes - that's exactly the input. I experimented with TCI-like inputs and with L1C data, preprocessed as described. To valdiate, I stacked the image 8 times and used the 8-image MISR checkpoint. Does this come closer to your results, or is there also another issue in your opinion?

8-image MISR

Bildschirmfoto 2024-05-24 um 09 52 16

L1C

Bildschirmfoto 2024-05-24 um 09 52 20

Do you have any other ideas, or recently ran SISR on your own end? Would be awesome if we could get it to work!

piperwolters commented 2 months ago

@simon-donike Could you tell me the web-mercator tile that this input/output is from? I will run it on my end and make sure I get the same result. But yes, I imagine the multi-image super resolution outputs will look better than the single-image - it is interesting that just repeating one image 8 times looks so much better than using one image.

simon-donike commented 2 months ago

@piperwolters Sorry for the late response, I was away for a week. IIRC this image is in Tampere, Finland at 61.5028, 23.7136 or at Web Mercator coordinates 2642463, 8740441. Thanks a lot for having a look yourself!