Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
336 stars 40 forks source link

Regression checkerboard #317

Open TaniaJG opened 1 month ago

TaniaJG commented 1 month ago

Hello, I am using Clay to do pixel-wise regression on water images. I have fine-tuned the model, and when doing predictions, I am getting a checkerboard pattern in the prediction output. Any ideas about what is happening?

I have seen that the model outputs in half resolution, and interpolates the image to have it with same size as the input one. Is it possible to get a prediction from the model with same size as the input, without doing this interpolation?

Could this checkerboard pattern be related to the fact of using water images?

ex_checkerboard_pattern

Thanks!!

srmsoumya commented 1 month ago

@TaniaJG We conducted a pixel-wise regression for ABG, and the details are available in the CLAY documentation. We didn’t observe any checkerboard patterns during this process, but the results do appear somewhat pixelated because we upsampled the predictions from half the image size.

We can generate predictions at the original image resolution by adding another upsampling layer in the model's fusion step. I’m not certain if this issue is specific to water images, but it would be worth testing this hypothesis further.

TaniaJG commented 1 month ago

Thank you Soumya for your fast response. I added another layer where you pointed, to have original resolution in the prediction, and this does not solve the problem. I will further investigate what is happening here.

Thanks!!

TaniaJG commented 3 weeks ago

Hi again!! I have been researching about this issue, and I have not been able to solve it yet.

By the way, I tried fine-tunning with patch_size=16 in the SegmentEncoder() called by the Regressor(), and now I don't get the previous striped pattern, but the result is like pixelated (see figure). Is it OK to use different patch sizes other than 8? fmc_ps16

Another thing that comes to my mind, is that the model makes mini patches of size 8x8, then calculates the embeddings for each one, and then it upsamples (with Conv) and downsamples (with MaxPool) them, to make the FPN. Would it make sense to do the mini-patches with different sizes, and calculate the embeddings for each size, in order to make a "more realistic" FPN? I mean, instead of upsampling/dowsampling the computed embeddings with patch size 8x8, we would directly have the embeddings at different sizes (8x8, 16x16, 32x32, ...). I would be grateful if you could clarify this, whether it is a reasonable approach or not, also taking into account memory consumption.

Thanks in advance for your great project!!

kjtheron commented 3 weeks ago

I have encountered the same checkerboard pattern when I fine-tuned a regression model to make canopy height predictions on 0.5m RGB and NIR image patches of 224x224 size.

Are there any recommendations on trying different feature maps or patches size for the Regressor()?

Any guidance would be appreciated.

TaniaJG commented 2 weeks ago

Hi, I think I solved the problem. I corrected the standarization values given in metadata.yml, substracting the mean from the standard value. For example, for S2 red band, mean was 1552, and std was 1888. However, this std value seemed weird; indeed, it seemed mean+std value, instead of std itself. I did 1888-1552, to get std = 336, and now the stripped pattern has been removed. Now I get the typical checkerboard pattern, but that will be probably removed by replacing Conv2dTranspose with PixelShuffle.

Could you please clarify why std values from metadata.yml have been added with mean values? Thanks!

image

kjtheron commented 2 weeks ago

Thanks, @TaniaJG for identifying the source of the checkerboard pattern.

For my example, I had to calculate the mean and std for my imagery. However, my statistics were calculated with a NAN value of 0 included, which skewed the values. After correcting the mean and std normalization values, the checkerboard pattern went away.

It's now just to upsample the results to the full resolution.