Issues related to RGB and LiDAR fusion

AirPlanBird commented 3 weeks ago

Hello, thank you for your contribution in this area, but I found that the vit output of the hook call seems to only use lidar in the process of looking at the code, and there are some errors in my changes, I would like the author to provide some advice。

Claud1234 commented 3 weeks ago

@AirPlanBird I don't get what you meant. You can share the code you modified here. Or we could talk in private. I am working on QQ at the moment, but I am always available on Skype. https://join.skype.com/invite/hign90IeLxg0

AirPlanBird commented 3 weeks ago

Hello, thanks for the reply, I found the following part while looking at the code. There seems to be some problem, the rgb data here doesn't seem to be being used, maybe I'm a little new to hooks, and I'm really sorry if I have a problem with my understanding.


def forward(self, rgb, lidar, modal='rgb'):
    t = self.transformer_encoders(lidar)
    previous_stage = None
    for i in np.arange(len(self.fusions)-1, -1, -1):
        hook_to_take = 't'+str(self.hooks[i])
        activation_result = self.activation[hook_to_take]
        if modal == 'rgb':
            reassemble_result_RGB = self.reassembles_RGB[i](activation_result) # claude check here
            reassemble_result_XYZ = torch.zeros_like(reassemble_result_RGB) # this is just to keep the space allocated but it will not be used later in fusion
        if modal == 'lidar':
            reassemble_result_XYZ = self.reassembles_XYZ[i](activation_result) # claude check here
            reassemble_result_RGB = torch.zeros_like(reassemble_result_XYZ) # this is just to keep the space allocated but it will not be used later in fusion
        if modal == 'cross_fusion':
            reassemble_result_RGB = self.reassembles_RGB[i](activation_result) # claude check here
            reassemble_result_XYZ = self.reassembles_XYZ[i](activation_result) # claude check here

        fusion_result = self.fusions[i](reassemble_result_RGB, reassemble_result_XYZ, previous_stage, modal) 
        previous_stage = fusion_result
    out_depth = None
    out_segmentation = None
    if self.head_depth != None:
        out_depth = self.head_depth(previous_stage)
    if self.head_segmentation != None:
        out_segmentation = self.head_segmentation(previous_stage)
    return out_depth, out_segmentation

palpitatingaaa commented 2 weeks ago

Hello, thanks for the reply, I found the following part while looking at the code.您好，感谢您的回复，我在查看代码时发现了以下部分。 There seems to be some problem, the rgb data here doesn't seem to be being used, maybe I'm a little new to hooks, and I'm really sorry if I have a problem with my understanding.好像有点问题，这里的rgb数据好像没有被使用，可能我对hooks有点陌生，如果我的理解有问题，真的很抱歉。

def forward(self, rgb, lidar, modal='rgb'):
    t = self.transformer_encoders(lidar)
    previous_stage = None
    for i in np.arange(len(self.fusions)-1, -1, -1):
        hook_to_take = 't'+str(self.hooks[i])
        activation_result = self.activation[hook_to_take]
        if modal == 'rgb':
            reassemble_result_RGB = self.reassembles_RGB[i](activation_result) # claude check here
            reassemble_result_XYZ = torch.zeros_like(reassemble_result_RGB) # this is just to keep the space allocated but it will not be used later in fusion
        if modal == 'lidar':
            reassemble_result_XYZ = self.reassembles_XYZ[i](activation_result) # claude check here
            reassemble_result_RGB = torch.zeros_like(reassemble_result_XYZ) # this is just to keep the space allocated but it will not be used later in fusion
        if modal == 'cross_fusion':
            reassemble_result_RGB = self.reassembles_RGB[i](activation_result) # claude check here
            reassemble_result_XYZ = self.reassembles_XYZ[i](activation_result) # claude check here

        fusion_result = self.fusions[i](reassemble_result_RGB, reassemble_result_XYZ, previous_stage, modal) 
        previous_stage = fusion_result
    out_depth = None
    out_segmentation = None
    if self.head_depth != None:
        out_depth = self.head_depth(previous_stage)
    if self.head_segmentation != None:
        out_segmentation = self.head_segmentation(previous_stage)
    return out_depth, out_segmentation

I don't understand this either, and I hope the author can answer it

Claud1234 commented 2 weeks ago

The concatenation of the camera and LiDAR data happens at clft/fusion.py.

Claud1234 / CLFT

Issues related to RGB and LiDAR fusion #3