Passing RGB and Depth image to network

andreaceruti commented 2 years ago

Hi, I want to combine 2 backbones as you do, one rgb backbone and one depth backbone, and at the end I want to fuse their features (in my case before feeding the fused features to RPN and ROI stages of the classical MaskRCNN). The problem is that I actually can't understand how, through the dataset mapper, I can pass the firsts 3 channels to the first rgb backbone and the lasts 3 channels to the depth backbone. I can see that you concatenate the channels, but then I miss the moment when this numpy array will be divided and tensors will be passed to corresponding backbone. Can you point me to the code implementation where this happens?

SeungBack commented 2 years ago

hi, please check the following lines. x is the backbone input from dataset_mapper

https://github.com/gist-ailab/uoais/blob/fb42d9a96cd54daad61c956d8d9d65dd0ebef4c7/adet/modeling/backbone/rgbdfpn.py#L269-L270

andreaceruti commented 2 years ago

@SeungBack Thank you!!

andreaceruti commented 2 years ago

Hi @SeungBack sorry if I bother you again. If I would like to use just one channel for the depth image, can I change this piece of code into this?

if self.bottom_up is not None:
            self.bottom_up_rgb_features, self.bottom_up_depth_features = self.bottom_up(x)
        else:
            self.bottom_up_rgb_features = self.bottom_up_rgb(x[:, :3, :, :])
            self.bottom_up_depth_features = self.bottom_up_depth(x[:, 3, :, :])  #4th depth channel

        bottom_up_features = {}
        for i, k in enumerate(self.bottom_up_rgb_features.keys()):
            if k in self.in_features:
                k_depth = k[:3] + "_" + k[3:] # res0 to res_0
                if self._rgbd_fuse_type == "conv":
                    bottom_up_feature = torch.cat([self.bottom_up_rgb_features[k], 
                                                    self.bottom_up_depth_features[k_depth]], 1)
                    bottom_up_features[k] = self.fusion_layers[i](bottom_up_feature)
                elif self._rgbd_fuse_type == "add":
                    bottom_up_features[k] = self.bottom_up_rgb_features[k] + self.bottom_up_depth_features[k_depth]

I can't understand if the second part of the code could cause some problems. I have added the right output feature names in the config/defaults.py

SeungBack commented 2 years ago

Can you provide the details of error?

You would need to change the depth backbone layer to allow the single channel input.

andreaceruti commented 2 years ago

Sorry for the confusion, at the end I managed to input only one grayscale image to the depth backbone instead of 3 depth channels as the images from NyuDatasetV2

gist-ailab / uoais

Passing RGB and Depth image to network #9