WeijingShi / Point-GNN

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, CVPR 2020.
MIT License
523 stars 114 forks source link

Some confusion about #83

Closed cwhYee closed 2 years ago

cwhYee commented 3 years ago

Thank you for sharing the code. Would you mind answering my questions?

  1. Why do you limit the maximum number of edges per vertex to 256 in training, but use all the edges in reasoning?
  2. What are the tools used for the visualization in the paper, open3D?
  3. In the configs directory, the name of the configuration file ,what are the meanings of ‘auto’ , ‘fixed’ , ‘T0’ and ‘T1’ ?

    Thanks in advance.

WeijingShi commented 3 years ago

Hi @cwhYee, Thank you for your interest.

  1. We set the maximum number of edges for training to mainly save GPU memory and also speed up the training.
  2. Yes, we use open3D for visualization.
  3. For the naming: https://github.com/WeijingShi/Point-GNN/issues/2#issuecomment-599252221

Hope it helps.

cwhYee commented 3 years ago

@WeijingShi Thank you so much!

cwhYee commented 3 years ago

Hello @WeijingShi ,I have another question, Would you mind answering for me? In training of car, 4 classes are predicted, why only one class has a high mAP? The same result in the training of pedestrain and cyclist. Can you help me analyze? Thanks in advance. image

WeijingShi commented 3 years ago

Hi @cwhYee,

These intermediate printouts are point-level results. i.e. classification of points that belongs to different categories.
class_idx 0 is the background class, class_idx 1 and 2 are the front view and side view cars, class idx 3 is the other objects.

The background class has way more points than other classes. That's why it gets good numbers.

Meanwhile, the network also regresses a bounding box for each point that belongs to a valid category (in this case, cars).

Thanks,

cwhYee commented 3 years ago

@WeijingShi Thanks for your patient answer!

cwhYee commented 3 years ago

Hello @WeijingShi, sorry to bother you again. I tried to modify the model and use the attention mechanism, as a beginner , I have some confusion.

  1. In training, as is shown in the picture, I found the utilization of the GPU is fluctuating. I don't know why. How should I analyze. GPU Normally, after data being loaded into the GPU should reach maximum utilization and remain roughly the same.

  2. Why don't you use only one model to detect car, pedestrian and cyclists? i.e. use 8 different categories: background class, front view and side view cars/pedestrian/cyc, other objects. If possible, how to implement it?

Thank you for your great job. Best wishes!

WeijingShi commented 2 years ago

Hi @cwhYee

  1. Not quite sure about the low utilization. More careful inspection is needed to find the bottleneck. If the bottleneck is on the CPU side, increasing the CPU preprocessing thread may help. In train_config, num_load_dataset_workers controls the number of parallel processes to load the data. You might also put the data file in your SSD in case that data reading is the limit.

  2. You can do that by adding more classes to the classification/regress head. The following is an example of putting them together: https://github.com/WeijingShi/Point-GNN/blob/48f3d79d5b101d3a4b8439ba74c92fcad4f7cab0/train.py#L108 https://github.com/WeijingShi/Point-GNN/blob/48f3d79d5b101d3a4b8439ba74c92fcad4f7cab0/dataset/kitti_dataset.py#L1132 However, the car and pedestrians have quite different dimensions (which need different radius when creating a single scale graph) and their sample numbers are very imbalanced in KITTI too. So we trained separated models to make the experiment cleaner. For an unified model, most likely we would need to create a UNet or FPN type graph structure to handle multiple dimensions better.

Hope it helps, Weijing

cwhYee commented 2 years ago

Thank you so much! Though this may be a challenge for me, I will have a try. Best wishes!

curiousboy20 commented 2 years ago

Hi @cwhYee,

These intermediate printouts are point-level results. i.e. classification of points that belongs to different categories. class_idx 0 is the background class, class_idx 1 and 2 are the front view and side view cars, class idx 3 is the other objects.

The background class has way more points than other classes. That's why it gets good numbers.

Meanwhile, the network also regresses a bounding box for each point that belongs to a valid category (in this case, cars).

Thanks,

Hi @WeijingShi, could you explain more about the class meaning in the pedestrian-cycle training? I got class_0 until class_5 while training ped_cyl_auto with KITTI dataset. Additionally, when I use my custom dataset, I got mAP=1 in class_0 but 0 in the other classes. I wonder what is class_0 and the other classes mean. Thank you

WeijingShi commented 2 years ago

Hi @curiousboy20,

Thanks for checking out this work.

For pedestrian-cycle, id 0: Background id 1: front view Pedestrian id 2: side view Pedestrian id 3: front view Cyclist id 4: side view Cyclist id 5: others

mAP=1 on id=0 and mAP=0 for others means either 1. the dataset contains overwhelmingly more background points than object points, and it causes imbalance issues for the network classification. Some balance methods such as adding more weights to object points might help. 2. some of the samples do not have objects points at all. If mAP is computed just in those samples, it is zero by default.

Hope it helps, Weijing