V2AI / Det3D

World's first general purpose 3D object detection codebse.
https://arxiv.org/abs/1908.09492
Apache License 2.0
1.48k stars 299 forks source link

A problem in the paper "Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection" #103

Closed Kins1ley closed 4 years ago

Kins1ley commented 4 years ago

In Section 2.1 Input and Augmentation

I am confused by "So we randomly sample 10% of 128106 (12810) point cloud samples for each category from the class-specific samples mentioned above." What does this mean? It seems that no more explanation was given in the following part.

And as for the Figure2, is it true that the blue colum and orange colum use the same vertical axis?

poodarchu commented 4 years ago
  1. there're 10 classes in total, we expect that each class take account for about 10% of the total samples. but in fact, there exists objects of various categories in one sample, so increase gt numbers of rare classes will also increase that of major classes. so the final distribution is not a perfect uniform distribution.
  2. after the sampling procedure, the dataset is actually 4.5 times larger than the original dataset. for fair comparison, we divide gt numbers by 4.5, so that we can compare the gt instances distribution before and after sampling.
Kins1ley commented 4 years ago
  1. there're 10 classes in total, we expect that each class take account for about 10% of the total samples. but in fact, there exists objects of various categories in one sample, so increase gt numbers of rare classes will also increase that of major classes. so the final distribution is not a perfect uniform distribution.
  2. after the sampling procedure, the dataset is actually 4.5 times larger than the original dataset. for fair comparison, we divide gt numbers by 4.5, so that we can compare the gt instances distribution before and after sampling.

Thanks for your reply. Are codes for data sampling in nuscenes.py load_infos?

Kins1ley commented 4 years ago
  1. there're 10 classes in total, we expect that each class take account for about 10% of the total samples. but in fact, there exists objects of various categories in one sample, so increase gt numbers of rare classes will also increase that of major classes. so the final distribution is not a perfect uniform distribution.
  2. after the sampling procedure, the dataset is actually 4.5 times larger than the original dataset. for fair comparison, we divide gt numbers by 4.5, so that we can compare the gt instances distribution before and after sampling.

May I ask where the data sampling idea come from? It seems an interesting idea to solve the imbalance problem in dataset.

poodarchu commented 4 years ago
  1. there're 10 classes in total, we expect that each class take account for about 10% of the total samples. but in fact, there exists objects of various categories in one sample, so increase gt numbers of rare classes will also increase that of major classes. so the final distribution is not a perfect uniform distribution.
  2. after the sampling procedure, the dataset is actually 4.5 times larger than the original dataset. for fair comparison, we divide gt numbers by 4.5, so that we can compare the gt instances distribution before and after sampling.

May I ask where the data sampling idea come from? It seems an interesting idea to solve the imbalance problem in dataset.

I thought of it by accident

pyun-ram commented 3 years ago

Is there an ablation study about the improvement of the proposed sampling? It might benefit the community a lot if you can provide some details about it. :) @poodarchu