Inquiry Regarding Paper and Code Adjustments for nuScenes Dataset

KexuanXia commented 6 months ago

Hi,

First, I would like to express my appreciation for your impressive work. I have been exploring your paper and the associated code and have a few questions that I hope you could help clarify.

In the paper, you mentioned that setting N=3000 iterations strikes a balance between runtime and the quality of the attribution maps. However, I noticed that the demo is set at 6000 iterations. Could you help me understand the reason for this difference?

Additionally, I am keen on adapting your approach for the nuScenes dataset. As a preliminary step, I'm considering adjusting the parameters a, b, c, and lambda in the OccAM model configuration. I've seen that there's already a discussion on this topic under the thread "Questions about mean voxel density #3". In your response, you mentioned using a few hundred samples to determine the mean voxel density. Could you share more about how this was achieved? Did you utilize any open-source libraries or was it an entirely custom development?

Besides modifying the values of a, b, c, and lambda, are there other adjustments that are essential for successfully applying this method to the nuScenes dataset?

Thank you for your time and assistance. I look forward to your insights.

Best regard, Kexuan Xia

dschinagl commented 6 months ago

Hi,

thank you for your interest in our work!

In the demo code, we used N=6000 to make the importance of each point even more prominent. However, the same gain in information about the important points can be obtained with significantly fewer iterations.

To determine the mean voxel density for a new dataset, the following steps should be performed on several hundred LiDAR frames (should be efficient for convergence of the fitted function):

Voxelization of the point cloud using spconv.
Calculation of the center coordinates for the non-empty voxels.
Computation of the pairwise distances between the center coordinates of all non-empty voxels using scipy.spatial.distance.cdist.
For all non-empty voxels, computation of the number of neighboring non-empty voxels within a given radius and normalization using the maximum number of possible neighbor voxels.
According to the distance between the LiDAR sensor and the corresponding non-empty voxel, this data is stored.

Finally, the distance range to cover is divided into 0.5m steps and the mean voxel density is calculated for each bin. These data points are then used to fit a function of the form f(distance) = 1 / (a * distance^2 + b * distance + c) using scipy.optimize.curve_fit.

The sampling probability, i.e. the probability that a voxel is not masked, is then computed as lambda * (1/f(distance)). The specific value of lambda has been empirically determined to ensure that the average similarity score is about 0.3, as shown in Figure 8 of the paper.

In total, for a new dataset, the function with which the voxel density decreases with distance must be approximated (since it depends on the LiDAR sensor characteristics) and the parameter lambda (depending on the detector) must be set such that the detections begin to degrade due to the masking.

I hope this helps.

With best regards David

KexuanXia commented 5 months ago

Hi,

I would like to sincerely thank you for your detailed and patient answer!

I believe it's really meaningful for me to reproduce the determination of the mean voxel density step by step on KITTI. If the result keeps the same as yours, I am going to do it again on a new dataset. Unfortunately, I am stuck in the first step.

I take as input a point cloud "000001.bin" form KITTI, use the function PointToVoxel from spconv to voxelize it.

# read point cloud and drop the 4th column since intensity is not necessary for voxelization
source_file_path = 'demo_data/000001.bin'
if source_file_path.split('.')[-1] == 'bin':
    points = np.fromfile(source_file_path, dtype=np.float32)
    points = points.reshape(-1, 4)[:, :3]
elif source_file_path.split('.')[-1] == 'npy':
    points = np.load(source_file_path)[:, :3]
else:
    raise NotImplementedError

# number of total points
print("number of points: ", points.shape[0])

# transfer np into torch
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
points = torch.from_numpy(points).float().to(device)

# initialize PointToVoxel
point_to_voxel = PointToVoxel(
    vsize_xyz=[0.2, 0.2, 0.2],  # voxel size
    coors_range_xyz=[-80, -80, -10, 80, 80, 50],  # coordinate ranges
    num_point_features=3,  # number of point features
    max_num_voxels=20000,  # maximum voxels, the same as the setting of your work
    max_num_points_per_voxel=200,  # maximum points in each voxel, the same as the setting of your work
    device=torch.device("cuda:0")  # GPU
)

voxels, indices, num_points_per_voxel = point_to_voxel(points)

I have two questions about the output num_points_per_voxel:

To the best of my knowledge, the total number of the points should be the same or at least quite similar before and after voxelization. However, I saw that before voxelization it was 122k but only 71k after voxelization. If I set max_num_voxels=40000, the problem disappeared. But it should be max_num_voxels=20000 according to the config file kitti_pointpillar.yaml in your work.
Mean voxel density is defined as the mean percentage of non-empty voxels within 1 m, depending on the distance.So, if I understand correctly, there should be some voxels which contains no points. Which means there should be some zero-elements in the array num_points_per_voxel, but my output showed they were all non-zero in num_points_per_voxel. It really confuses me.
```
number of points:  122109
Number of Points after Voxelization: 71198
Number of zero voxels: 0
```

Thank you for reading this far. Wish you have a good day.

Best Regards, Kexuan Xia

dschinagl commented 5 months ago

Hi,

Yes, that is correct, the number of points before and after voxelisation must be the same, i.e. the sum of num_points_per_voxel. A few comments on this. First, it seems that you did not crop the point cloud according to the camera frustum as done usually in the pcdet dataloader, therefore the maximum number of voxel is too small in this case. Second, I use the CPU implementation of spconv like pcdet, because I had some problems with the GPU implementation. And third, you can alternatively use the generate_voxel_with_id function (https://github.com/traveller59/spconv/blob/2.1.x/spconv/pytorch/utils.py) to check where the points are lost.

Regarding the second question: Yes, the majority of voxels should be empty. However, you can also use the number of all possible voxels within the radius to compute the density.

Best regards David

KexuanXia commented 5 months ago

Hi,

Thank you so much for your thorough and considerate response!

I did not crop the point cloud, I will do it to see if this is the only cause for the problem. Could you please tell me why 42 degrees is used to crop the point cloud? Is it because the same degrees were used for getting the images in KITTI?

I wish you have a wonderful week.

Best regards Kexuan Xia

dschinagl commented 5 months ago

Hi,

the 42 degrees is just an approximation to make the demo work without the corresponding KITTI info files.

However, this is not a prerequisite for determining the voxel density, you just need to increase the number of maximum voxels in the voxel generator, as you already noted.

Best regards David

dschinagl / occam

Inquiry Regarding Paper and Code Adjustments for nuScenes Dataset #9