PatrickTUM / UnCRtainTS

https://patricktum.github.io/cloud_removal/
47 stars 3 forks source link

Simple example of cloud removal #3

Closed CatalinVulcan closed 1 year ago

CatalinVulcan commented 1 year ago

Hello Patrick,

I'm interested in using your code to remove clouds from my own dataset.

Can you please provide the simplest image example and code to perform inference on these images using your code and one of the pre-trained models?

This would help me greatly understand the data structure needed without downloading the large dataset and the basics of how the code operates.

Thank you

GKG1312 commented 1 year ago

Hello @CatalinVulcan and @PatrickTUM , I am trying to use this program for custom data and have arranged my data as per the original data directory, but still, I need to figure out where and how to add those paths.

In the same SEN12MSCR data folder, I added one more folder of a custom dataset with a similar directory structure as shown below:

../SEN12MSCR/:
            ├───ROIs2021_winter_s1
            |   ├─s1_1
            |   |   |...
            |   |   ├─ROIs2021_winter_s1_1_p15.tif
            |   |   |...
            |    ...
            ├───ROIs2021_winter_s2_cloudy
            |   ├─s2_cloudy_1
            |   |   |...
            |   |   ├─ROIs2021_winter_s2_cloudy_1_p15.tif
            |   |   |...
            |    ...
            ...

I tried to change the test split for complete dataset in dataLoader.py in class SEN12MSCR(Dataset)as follows:

self.splits['test'] = ['ROIs2021_winter_s1/s1_1', 'ROIs2021_winter_s1/s1_2']

but while running test_reconstruct.py with following command

python test_reconstruct.py --experiment_name trial_1  --input_t 3 --region all --export_every 1 --res_dir ./inference --weight_folder ./results --root3 /home/girish/Desktop/MS_Work/new_CR_method/SEN12MSCR/ 

I am getting following error:

Processing paths for test split of region all 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.58it/s] /home/girish/Desktop/MS_Work/new_CR_method/UnCRtainTS/data/dataLoader.py:674: UserWarning: No data samples found! Please use the following directory structure:

    path/to/your/SEN12MSCR/directory:
        ├───ROIs1158_spring_s1
        |   ├─s1_1
        |   |   |...
        |   |   ├─ROIs1158_spring_s1_1_p407.tif
        |   |   |...
        |    ...
        ├───ROIs1158_spring_s2
        |   ├─s2_1
        |   |   |...
        |   |   ├─ROIs1158_spring_s2_1_p407.tif
        |   |   |...
        |    ...
        ├───ROIs1158_spring_s2_cloudy
        |   ├─s2_cloudy_1
        |   |   |...
        |   |   ├─ROIs1158_spring_s2_cloudy_1_p407.tif
        |   |   |...
        |    ...
        ...

    Note: Please arrange the dataset in a format as e.g. provided by the script dl_data.sh.

warnings.warn("""No data samples found! Please use the following directory structure: Loading checkpoint ./results/trial_1/model.pth.tar Testing . . . Traceback (most recent call last): File "/home/girish/Desktop/MS_Work/new_CR_method/UnCRtainTS/model/test_reconstruct.py", line 119, in main(config) File "/home/girish/Desktop/MS_Work/new_CR_method/UnCRtainTS/model/testreconstruct.py", line 110, in main , test_img_metrics = iterate(model, data_loader=test_loader, config=config, writer=writer, File "/home/girish/Desktop/MS_Work/new_CR_method/UnCRtainTS/model/train_reconstruct.py", line 280, in iterate if len(data_loader) == 0: raise ValueError("Received data loader with zero samples!") ValueError: Received data loader with zero samples!

Note: I think the issue is with pairing the data. In the dataLoader.py function, get_paths checks if any of the files [ S1, S2, S2_cloudy] is missing, then it excludes those files from the test splits. I want to know why we want the S2 folder during the test. I am assuming that the folder contains cloud-free images. It would be great if any of you could help me with this. Thank You

PatrickTUM commented 1 year ago

Hi @CatalinVulcan and @GKG1312,

the provided dataloaders are written with SEN12MS-CR and SEN12MS-CR-TS in mind. That said, their built-in checks for completeness only keep samples for which paired cloud-free Sentinel-2 data is available, in line with the design of the datasets. In both datasets this is used for supervised training or at test time.

To my understanding, you would like to apply the provided code or models on custom data which may not share the same structure as the original datasets. Basically, you can do this in at least two different manners:

  1. If you don't have or need cloud-free Sentinel-2 data but would nonetheless like to build upon the existing dataloader, then you may just duplicate other Sentinel-2 data and place it as a dummy file into the respective cloud-free directories. This solution is rather dirty, but as long as you just wish to obtain cloud-removed predictions without any further evaluation it'll yield what you want.
  2. You write your own custom scripts, relying solely on (cloudy) Sentinel-2 (plus, if utilized, Sentinel-1) data. This nonetheless supposes that the provided Sentinel-1 and Sentinel-2 level 1-C are preprocessed as in Google Earth Engine and subsequently normalized as in the dataloader, because these are the statistics the networks were trained on during training. Depending on what data and model you want to work with, you may find variations of the following code useful to build upon:
# import, initialize and load the model from current working directory ./model
import sys, os, torch
sys.path.append(os.path.dirname(os.getcwd()))
from src.backbones import uncrtaints
from data.dataLoader import read_tif, read_img, process_MS, process_SAR
from datetime import datetime

to_date = lambda string: datetime.strptime(string, '%Y-%m-%d')
S1_LAUNCH = to_date('2014-04-03')
S1_BANDS, S2_BANDS = 2, 13

model = uncrtaints.UNCRTAINTS(input_dim=S1_BANDS+S2_BANDS,
    encoder_widths=[128],
    decoder_widths=[128, 128, 128, 128, 128], 
    out_conv=[2*S2_BANDS],
    out_nonlin_mean=True,
    out_nonlin_var="softplus",
    agg_mode="att_group",
    encoder_norm="group",
    decoder_norm="batch",
    n_head=16,
    d_model=256,
    d_k=4,
    pad_value=0,
    padding_mode="reflect",
    positional_encoding=True,
    covmode="diag",
    scale_by=10.0,
    separate_out=False,
    use_v=False,
    block_type='mbconv',
    is_mono=False)
trained_checkp = "/media/pwjebel/Hinton/Downloads/model.pth.tar"
pretrained_dict = torch.load(trained_checkp, map_location="cpu")["state_dict_G"]
model.load_state_dict(pretrained_dict, strict=True)
model.eval()

# fetch and prepare custom data
path_S1, path_S2 = ['put paths to your custom data here'], ['following GEE pre-processing pipeline']
dates_S1 = dates_S2 = [(to_date(date)-S1_LAUNCH).days for date in ['date_1', '...', 'date_T']] # simply set to None, if mono-temporal
dates = torch.stack((torch.tensor(dates_S1),torch.tensor(dates_S2))).float().mean(dim=0)[None]

in_S1, in_S2 = torch.Tensor([process_SAR(read_img(read_tif(path)), 'default') for path in path_S1])[None], torch.Tensor([process_MS(read_img(read_tif(path)), 'default') for path in path_S2])[None]
real_A = torch.cat((in_S1, in_S2),dim=2)*10 # resulting tensor is [Batchsize x Time x Channels x Height x Width]

# forward propagation at inference time
with torch.no_grad():
    fake_B = model(real_A, batch_positions=dates)
mean = fake_B[:, :, :S2_BANDS, ...]/10
var = fake_B[:, :, S2_BANDS:, ...]/100

For a brief explanation of the scaling constant, see here. Thanks to the way the correlative temporal attention mechanism is designed, you may initialize a multi-temporal model and load a corresponding pre-trained checkpoint, which may subsequently be applied on an input time series of varying length.

I hope this helps you two!

Cheers, Patrick

GKG1312 commented 1 year ago

Hi @PatrickTUM , Thank you for the help. I tried the first method you mentioned above but cannot get cloud-removed images. Also, the obtained number of images is less than what has been provided for input (for example, if I am giving 45 images for prediction, I am getting only 42 images, and the order of the results is unknown as the naming of results is different than the inputs, this I required to repatch my image completely after cloud removal).

starcksi commented 7 months ago

Hi @CatalinVulcan and @GKG1312, Did you able to manage it to run on custom dataset?

someshfengde commented 2 months ago

Hi @CatalinVulcan @GKG1312 @starcksi

Was anyone able to get it running on custom dataset? would be great if you can share docker or colab example for replication?