facebookresearch / votenet

Deep Hough Voting for 3D Object Detection in Point Clouds
MIT License
1.69k stars 377 forks source link

Boxnet getting better results than Votenet #33

Open kentaroy47 opened 4 years ago

kentaroy47 commented 4 years ago

I'm sorry that this is not a general issue but raised this so that it may help people training on their custom datasets.

I'm testing on my custom dataset and experiencing Boxnet getting better results than votenet. To simplify the training and dataset creation process, the dataset is only single class and the heading is single class as well.

Boxnet..
eval mean box_loss: 0.143415
eval mean center_loss: 0.033522
eval mean heading_cls_loss: 0.408684
eval mean heading_reg_loss: 0.021197
eval mean loss: 1.641386
eval mean neg_ratio: 0.340674
eval mean obj_acc: 0.952075
eval mean objectness_loss: 0.041447
eval mean pos_ratio: 0.659326
eval mean sem_cls_loss: 0.000000
eval mean size_cls_loss: 0.000000
eval mean size_reg_loss: 0.047828
0 0.8460753105326817
eval person Average Precision: 0.846075
eval mAP: 0.846075
eval person Recall: 0.984816
eval AR: 0.984816

Votenet
eval mean box_loss: 0.131423
eval mean center_loss: 0.038023
eval mean heading_cls_loss: 0.539432
eval mean heading_reg_loss: 0.001078
eval mean loss: 5.734286
eval mean neg_ratio: 0.920239
eval mean obj_acc: 0.984032
eval mean objectness_loss: 0.015435
eval mean pos_ratio: 0.018091
eval mean sem_cls_loss: 0.000000
eval mean size_cls_loss: 0.000000
eval mean size_reg_loss: 0.038379
eval mean vote_loss: 0.434288
eval person Average Precision: 0.382230
eval mAP: 0.382230
eval person Recall: 0.558568
eval AR: 0.558568

Has anyone experienced similar issues or tips?

Some things I noticed:

charlesq34 commented 4 years ago

Hi @kentaroy47

that's very interesting. Can you share a bit more on your dataset/use case?

kentaroy47 commented 4 years ago

@charlesq34 Great repo by the way, thanks for sharing the codes.

The dataset source is taken from an indoor lidar scene and I'm trying to detect person from it. The point cloud density is similar to sunrgbd.

Since the lidar is installed in a fixed position, I do background subtractions to make the problem easier.

Another thing was that the training results are heavily affected by the mean_size_arr. As you suggested, I may need to clusterize the mean_size_arr to several sub-classes, since the size of person varies quite much.

orlitany commented 4 years ago

Hi kentaro, I'm quite surprised the difference is so big -- I would try to visualize the votes and figure out why they're hurting performance. if you can share these visualizations, even better :)

On Fri, Oct 11, 2019, 07:08 Kentaro Yoshioka notifications@github.com wrote:

I'm testing on my custom dataset and experiencing Boxnet getting better results than votenet.. To simplify, it is only single class and the heading is also single class too.

Boxnet.. eval mean box_loss: 0.143415 eval mean center_loss: 0.033522 eval mean heading_cls_loss: 0.408684 eval mean heading_reg_loss: 0.021197 eval mean loss: 1.641386 eval mean neg_ratio: 0.340674 eval mean obj_acc: 0.952075 eval mean objectness_loss: 0.041447 eval mean pos_ratio: 0.659326 eval mean sem_cls_loss: 0.000000 eval mean size_cls_loss: 0.000000 eval mean size_reg_loss: 0.047828 0 0.8460753105326817 eval person Average Precision: 0.846075 eval mAP: 0.846075 eval person Recall: 0.984816 eval AR: 0.984816

Votenet eval mean box_loss: 0.131423 eval mean center_loss: 0.038023 eval mean heading_cls_loss: 0.539432 eval mean heading_reg_loss: 0.001078 eval mean loss: 5.734286 eval mean neg_ratio: 0.920239 eval mean obj_acc: 0.984032 eval mean objectness_loss: 0.015435 eval mean pos_ratio: 0.018091 eval mean sem_cls_loss: 0.000000 eval mean size_cls_loss: 0.000000 eval mean size_reg_loss: 0.038379 eval mean vote_loss: 0.434288 eval person Average Precision: 0.382230 eval mAP: 0.382230 eval person Recall: 0.558568 eval AR: 0.558568

Has anyone experienced similar issues or tips? I guess the problem is that the voting loss doesn't converge so well on the custom dataset.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/votenet/issues/33?email_source=notifications&email_token=ABFYOZXLUOJW6XU6XEQH6YDQN734VA5CNFSM4I7UZIH2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HRDHOLQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFYOZUTTIIC52422FGOWKTQN734VANCNFSM4I7UZIHQ .

charlesq34 commented 4 years ago

Thanks @kentaroy47 for the extra info. I think if the background is subtracted, then the voting seems shouldn't be too hard. It might be helpful to visualize the ground truth votes computed, to see if they are easily predictable.

jediofgever commented 4 years ago

@kentaroy47 can you share your steps while you created your custom dataset ? it would be a great help to others trying to do so, thank you for your time

kentaroy47 commented 4 years ago

@jediofgever https://github.com/facebookresearch/votenet/blob/master/doc/tips.md I followed the instructions here and created a custom sunrgbd/sunrgbd_data.py sunrgbd/model_util_config.py.

The only things you should modify in the votenet codes are the mean size of the classes, which are used to regress the bounding boxes. The main part will be generating the point cloud, bounding box, and vote numpy files. I highly recommend generating the sungrgb dataset first (you need matlab..), and see how those numpy files look.

for fileid: hoge, the dataset should include:

chinacui commented 4 years ago

Hi. @charlesq34 . I have tried the boxnet and votenet on my own dataset, which has packing objects and a large background (nearly ~80%). And I found that the boxnet has relatively better performance. I am wondering whether the background affects the voting scheme? Thanks.

charlesq34 commented 4 years ago

@chinacui

It's possible. If the scene contains mostly background points, votes may not have very high density, so the difference between boxnet and votenet could be small.

A variation of the method is to predict binary scores for foreground and background classes for each point and then weight point features by the predicted scores (therefore foreground points contribute more to the voting).