ethz-asl / TULIP

MIT License
18 stars 3 forks source link

a training detail of ILN #2

Closed zcliangyue closed 7 hours ago

zcliangyue commented 5 days ago

Hi Bin Yang,

I have a question regarding model training. In ILN, the query rays are defined by the vertical angle range of the lidar and the number of lines. However, unlike your approach of uniform downsampling (which changes the vertical angle range), ILN samples data with different resolutions while keeping the same vertical angle.

Specifically, I explain this with the following code:

if(down_sample):
    lidar_in = copy.copy(lidar_out)
    lidar_in['channels'] = np.int16(lidar['channels'] / downsample_rate[0])
    v_res = (lidar['max_v'] - lidar['min_v']) / (lidar['channels'] - 1)
    lidar_in['min_v'] = lidar_out['min_v'] + v_res * (downsample_rate[0] - 1) # change the min vertical angle
    lidar_in['points_per_ring'] = np.int16(lidar_out['points_per_ring'] / downsample_rate[1])
else:
    lidar_in = copy.deepcopy(self.lidar_out)
    lidar_in['channels'] = int(config['res_in'].split('_')[0])
    lidar_in['points_per_ring'] = int(config['res_in'].split('_')[1])

coord = generate_laser_directions(self.lidar_out)
coord = torch.unsqueeze(torch.tensor(normalization_queries(self.coord, self.lidar_in)), 0)

As you can see, when creating new query rays using downsampling, the minimum vertical angle needs to be changed. In ILN's experimental setup, it only requires changing the number of lidar lines (e.g., from 64 to 16). Ignoring this aspect might lead to incorrect neighborhood selection and weight prediction. I would like to know if you considered this in your training of ILN?

Thank you for your patience and I look forward to your response.

binyang97 commented 2 days ago

Hi, thank you for your questuion.

yes, as ILN is not a pixel-wise super-resolution method, we have considered using the same downsampling way is not suitable. So for all those training, we used the implementation from their source code. Only changes is the data loader for our custom dataset (loading 64 beam dataset). For any other pre-processing steps including data augmentation and transformations, we used the default settings of the individual approach.

Best, Bin

zcliangyue commented 2 days ago

Thank you very much for your response. I would like to confirm a point: when creating the query rays for ILN, did you consider the change in minimum vertical angle caused by downsampling? The ILN source code does not take this into account, as it is only trained on virtual datasets. I trained ILN on two datasets (CARLA and KITTI) and found that the accuracy I achieved was significantly higher than what was reported in your paper, so I suspect there might be some differences in implementation details.

Apologies for any inconvenience caused. Looking forward to your response.

Hi, thank you for your questuion.

yes, as ILN is not a pixel-wise super-resolution method, we have considered using the same downsampling way is not suitable. So for all those training, we used the implementation from their source code. Only changes is the data loader for our custom dataset (loading 64 beam dataset). For any other pre-processing steps including data augmentation and transformations, we used the default settings of the individual approach.

Best, Bin

binyang97 commented 2 days ago

Thank you very much for your response. I would like to confirm a point: when creating the query rays for ILN, did you consider the change in minimum vertical angle caused by downsampling? The ILN source code does not take this into account, as it is only trained on virtual datasets. I trained ILN on two datasets (CARLA and KITTI) and found that the accuracy I achieved was significantly higher than what was reported in your paper, so I suspect there might be some differences in implementation details.

Apologies for any inconvenience caused. Looking forward to your response.

Hi, thank you for your questuion. yes, as ILN is not a pixel-wise super-resolution method, we have considered using the same downsampling way is not suitable. So for all those training, we used the implementation from their source code. Only changes is the data loader for our custom dataset (loading 64 beam dataset). For any other pre-processing steps including data augmentation and transformations, we used the default settings of the individual approach. Best, Bin

Do you mean the difference in FoV of two datasets? If yes, we have changed the sensor configuration for KITTI dataset, also for DurLAR. As you said, ILN has actually only trained on synthetic dataset. For other details in the code, we did not add further modiftications. Do you obtain the improvement on both datasets? Because at least in CARLA, I don't think we have trained in a wrong way. In supplementary, we have also trained ILN with different upsampling rates and we actually get even much better results compared to the original paper.

zcliangyue commented 2 days ago

No, I am referring to the difference in FoV of different resolutions on the same dataset. ILN creates the vertical angle of each scan line by using

np.linspace(start=lidar['min_v'], stop=lidar['max_v'], num=lidar['channels'])

However, uniform downsampling will change the size of min_v. I have drawn a simple diagram, with the upper part showing the downsampling method used in your paper, and the lower part showing the separate sampling method adopted by ILN. image

Thank you very much for your response. I would like to confirm a point: when creating the query rays for ILN, did you consider the change in minimum vertical angle caused by downsampling? The ILN source code does not take this into account, as it is only trained on virtual datasets. I trained ILN on two datasets (CARLA and KITTI) and found that the accuracy I achieved was significantly higher than what was reported in your paper, so I suspect there might be some differences in implementation details. Apologies for any inconvenience caused. Looking forward to your response.

Hi, thank you for your questuion. yes, as ILN is not a pixel-wise super-resolution method, we have considered using the same downsampling way is not suitable. So for all those training, we used the implementation from their source code. Only changes is the data loader for our custom dataset (loading 64 beam dataset). For any other pre-processing steps including data augmentation and transformations, we used the default settings of the individual approach. Best, Bin

Do you mean the difference in FoV of two datasets? If yes, we have changed the sensor configuration for KITTI dataset, also for DurLAR. As you said, ILN has actually only trained on synthetic dataset. For other details in the code, we did not add further modiftications. Do you obtain the improvement on both datasets? Because at least in CARLA, I don't think we have trained in a wrong way. In supplementary, we have also trained ILN with different upsampling rates and we actually get even much better results compared to the original paper.

binyang97 commented 2 days ago

No, I am referring to the difference in FoV of different resolutions on the same dataset. ILN creates the vertical angle of each scan line by using

np.linspace(start=lidar['min_v'], stop=lidar['max_v'], num=lidar['channels'])

However, uniform downsampling will change the size of min_v. I have drawn a simple diagram, with the upper part showing the downsampling method used in your paper, and the lower part showing the separate sampling method adopted by ILN. image

Thank you very much for your response. I would like to confirm a point: when creating the query rays for ILN, did you consider the change in minimum vertical angle caused by downsampling? The ILN source code does not take this into account, as it is only trained on virtual datasets. I trained ILN on two datasets (CARLA and KITTI) and found that the accuracy I achieved was significantly higher than what was reported in your paper, so I suspect there might be some differences in implementation details. Apologies for any inconvenience caused. Looking forward to your response.

Hi, thank you for your questuion. yes, as ILN is not a pixel-wise super-resolution method, we have considered using the same downsampling way is not suitable. So for all those training, we used the implementation from their source code. Only changes is the data loader for our custom dataset (loading 64 beam dataset). For any other pre-processing steps including data augmentation and transformations, we used the default settings of the individual approach. Best, Bin

Do you mean the difference in FoV of two datasets? If yes, we have changed the sensor configuration for KITTI dataset, also for DurLAR. As you said, ILN has actually only trained on synthetic dataset. For other details in the code, we did not add further modiftications. Do you obtain the improvement on both datasets? Because at least in CARLA, I don't think we have trained in a wrong way. In supplementary, we have also trained ILN with different upsampling rates and we actually get even much better results compared to the original paper.

I see. For that, we did not adjust the min_v after downsampling. We use the same configuration for high and low resolution data from the same dataset

zcliangyue commented 2 days ago

Thank you for your patient response. I think this explains why in your result, there is only ILN unable to match the characteristic line pattern of the LiDAR on the ground.

binyang97 commented 2 days ago

Thank you for your patient response. I think this explains why in your result, there is only ILN unable to match the characteristic line pattern of the LiDAR on the ground.

No problem and thank you for the discussion. By the way, I'm not sure whether that's the main reason for the reconstruction result of ILN. The line patterns are not shifted but upsampled repeatedly, and we found the case can be observed overall in the scene, especially in regions that are far from the sensor origin, which is due to the sparsity of points.

zcliangyue commented 2 days ago

I think I can explain this phenomenon briefly: ILN finds the four nearest neighboring points on the image by querying the ray. Suppose there are three adjacent scan lines on the ground, $ijk$, with corresponding distances $r_i < r_j < r_k$. If a query ray originally between the $i$ and $j$ scan lines is incorrectly moved to between $j$ and $k$, the interpolation algorithm cannot be optimized, because the interpolation algorithm is a convex combination of the neighboring distances, i.e., between $r_j$ and $r_k$. In reality, the ground truth lies between $r_i$ and $r_j$. The final result is that the network optimizes it to $r_j$, and the visual effect is that the same scan line is repeatedly upsampled.

Thank you for your patient response. I think this explains why in your result, there is only ILN unable to match the characteristic line pattern of the LiDAR on the ground.

No problem and thank you for the discussion. By the way, I'm not sure whether that's the main reason for the reconstruction result of ILN. The line patterns are not shifted but upsampled repeatedly, and we found the case can be observed overall in the scene, especially in regions that are far from the sensor origin, which is due to the sparsity of points.

zcliangyue commented 2 days ago

I think I can explain this phenomenon briefly: ILN finds the four nearest neighboring points on the image by querying the ray. Suppose there are three adjacent scan lines on the ground, ijk, with corresponding distances ri<rj<rk. If a query ray originally between the i and j scan lines is incorrectly moved to between j and k, the interpolation algorithm cannot be optimized, because the interpolation algorithm is a convex combination of the neighboring distances, i.e., between rj and rk. In reality, the ground truth lies between ri and rj. The final result is that the network optimizes it to rj, and the visual effect is that the same scan line is repeatedly upsampled.

Thank you for your patient response. I think this explains why in your result, there is only ILN unable to match the characteristic line pattern of the LiDAR on the ground.

No problem and thank you for the discussion. By the way, I'm not sure whether that's the main reason for the reconstruction result of ILN. The line patterns are not shifted but upsampled repeatedly, and we found the case can be observed overall in the scene, especially in regions that are far from the sensor origin, which is due to the sparsity of points.

I draw a diagram for it. The red line is corresponding to the situation i explained in this comment. 插值错误

binyang97 commented 2 days ago

I think I can explain this phenomenon briefly: ILN finds the four nearest neighboring points on the image by querying the ray. Suppose there are three adjacent scan lines on the ground, ijk, with corresponding distances ri<rj<rk. If a query ray originally between the i and j scan lines is incorrectly moved to between j and k, the interpolation algorithm cannot be optimized, because the interpolation algorithm is a convex combination of the neighboring distances, i.e., between rj and rk. In reality, the ground truth lies between ri and rj. The final result is that the network optimizes it to rj, and the visual effect is that the same scan line is repeatedly upsampled.

Thank you for your patient response. I think this explains why in your result, there is only ILN unable to match the characteristic line pattern of the LiDAR on the ground.

No problem and thank you for the discussion. By the way, I'm not sure whether that's the main reason for the reconstruction result of ILN. The line patterns are not shifted but upsampled repeatedly, and we found the case can be observed overall in the scene, especially in regions that are far from the sensor origin, which is due to the sparsity of points.

I draw a diagram for it. The red line is corresponding to the situation i explained in this comment. 插值错误

Thank you for the detailed explanation! The diagram is also intuitive to understand. Learnt a lot :)