Issues with using the NIR band in the latest version

krishanr commented 2 years ago

Hi all,

I have two issues to report with running working with 4band 16 bit imagery in version 6852729 using a config file similar to https://github.com/NRCan/geo-deep-learning/blob/v.1.2.0/conf/development/config_test_4channels_implementation.yaml.

All the steps up to inference.py can be run without error, but inference.py reports the following error:

2021-11-02 22:55:13,208 root 487 [INFO][main] Number of cuda devices requested: 2. Cuda devices available: {0: {'used_ram_at_init': 1705.75, 'max_ram': 11019.4375}, 1: {'used_ram_at_init': 1171.75, 'max_ram': 11019.4375}}. Using 0

2021-11-02 22:55:13,964 root 238 [INFO][net] Finetuning pretrained deeplabv3 with 4 input channels (imagery bands). Concatenation point: "conv1" 2021-11-02 22:55:13,964 root 85 [INFO][load_checkpoint] => loading model 'data/raw/mining7/samples256_overlap0_min-annot0_4bands_mon25/model/mining7/checkpoint.pth.tar'

2021-11-02 22:55:15,840 root 413 [WARNING][readcsv] Unable to sort csv rows Validating imagery: 100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 60205.32it/s] 2021-11-02 22:55:15,841 root 526 [INFO][main] Successfully validated imagery Traceback (most recent call last): File "inference.py", line 691, in main(params) File "inference.py", line 543, in main model, = load_from_checkpoint(loaded_checkpoint, model) File "geo-deep-learning/utils/utils.py", line 74, in load_from_checkpoint model.load_state_dict(checkpoint['model'], strict=strict_loading) File "anaconda3/envs/geo_deep_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for LayersEnsemble: size mismatch for conv1x1.weight: copying a param with shape torch.Size([2048, 4096, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]). size mismatch for conv1x1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([64]).

The above error is due to the fact that the LayersEnsemble model is using concatenation point 'conv1' when the model was built with 'layer4', which is in the yaml configuration file. This can be fixed modifying line 496 in inference.py like so:

  model, loaded_checkpoint, model_name = net(model_name=model_name,
                                             num_bands=num_bands,
                                             num_channels=num_classes_backgr,
                                             dontcare_val=dontcare_val,
                                             num_devices=1,
                                             net_params=params,
                                             inference_state_dict=state_dict,
                                             conc_point=params['global']['concatenate_depth'])

Once the above line is added to inference.py, the inferences are generated appropriately.

However one last issue remains, because there is no normalization used for the images. The model then starts to learn the ignored class (with value -1) in addition to the target class (here we're doing binary segmentation). Any ideas on how to prevent the model from learning the ignored class?

valhassan commented 2 years ago

Hi Krishan,

What is the shape of the final output layer?

krishanr commented 2 years ago

Using concatenate_depth 'layer4' the last conv1x1 layer is Conv2d(4096, 2048, kernel_size=(1, 1), stride=(1, 1)). Also for an input tensor of shape (1, 4, 256, 256) (initalized with torch.randn), the models output shape is [1, 2, 256, 256].

I remember I had this issue with the pretrained_unet model when I turned off the normalization. One possibility is to reincorporate the normalization for the 4 band imagery? Since according to the deeplabv3 documentation the image should be normalized.

valhassan commented 2 years ago

For debugging purposes, I will suggest using unet_pretrained and skip concatenation involved with deeplabv3.

You will be able to use 4bands with the unet_pretrained, make sure to set num_bands to 4
For normalization make sure you set the mean and std appropriately for all four bands ex. mean: [0.485, 0.456, 0.406, 0.001]
The ignore class is useful with a handful of losses on GDL, I can confirm it works with CrossEntropy. Take a look at resources on the use of ignore index https://discuss.pytorch.org/t/when-to-use-ignore-index/5935

remtav commented 1 year ago

Ongoing discussions to remove deeplabv3 with NIR injection. See #218

NRCan / geo-deep-learning

Issues with using the NIR band in the latest version #212