Can I set proj_first = False when training v2vnet?

yifanlu0227 commented 2 years ago

Hi, thanks for your great work and clean code. I am studying it and have small question about the v2vnet fusion.

I want to set proj_first = False to use the full feature map for each cav, but setting this flag False in yaml directly seems to have problem.

I find the pairwise_t_matrix[i,j] will be all zero if i == j, but I think it should be identity matrix. https://github.com/DerrickXuNu/OpenCOOD/blob/25a5db74d3040e9ed1c2e4d284fc33819d25db25/opencood/data_utils/datasets/intermediate_fusion_dataset.py#L362-L382

And the pairwise_t_matrix is set for all vehicles, but after that you filter some of cavs by COM_RANGE. So they are not corresponding if you take [:N] on it? https://github.com/DerrickXuNu/OpenCOOD/blob/25a5db74d3040e9ed1c2e4d284fc33819d25db25/opencood/models/fuse_modules/v2v_fuse.py#L94-L97

yifanlu0227 commented 2 years ago

And I find the loss of training(point_pillar_v2vnet) is not decreasing 😨. I use the the default yaml setting, proj_first=True.

Even if I use the pretrained model, It starts with a low loss but rises up rapidly.

[epoch 100][1/1593], || Loss: 0.0769 || Conf Loss: 0.0480 || Loc Loss: 0.0288 [epoch 100][2/1593], || Loss: 1.5813 || Conf Loss: 0.6716 || Loc Loss: 0.9097 [epoch 100][3/1593], || Loss: 210.8635 || Conf Loss: 209.9329 || Loc Loss: 0.9306 [epoch 100][4/1593], || Loss: 2.6982 || Conf Loss: 1.2400 || Loc Loss: 1.4582 [epoch 100][5/1593], || Loss: 9.6059 || Conf Loss: 8.5419 || Loc Loss: 1.0641 [epoch 100][6/1593], || Loss: 4.7194 || Conf Loss: 3.1573 || Loc Loss: 1.5620 [epoch 100][7/1593], || Loss: 5.6964 || Conf Loss: 4.3628 || Loc Loss: 1.3336 [epoch 100][8/1593], || Loss: 5.4645 || Conf Loss: 4.2582 || Loc Loss: 1.2063 [epoch 100][9/1593], || Loss: 5.2956 || Conf Loss: 4.0639 || Loc Loss: 1.2318 [epoch 100][10/1593], || Loss: 5.8338 || Conf Loss: 4.4298 || Loc Loss: 1.4040 [epoch 100][11/1593], || Loss: 5.4485 || Conf Loss: 4.1031 || Loc Loss: 1.3454 [epoch 100][12/1593], || Loss: 4.6731 || Conf Loss: 3.1722 || Loc Loss: 1.5010 [epoch 100][13/1593], || Loss: 4.5258 || Conf Loss: 3.7020 || Loc Loss: 0.8238 [epoch 100][14/1593], || Loss: 3.6519 || Conf Loss: 2.6879 || Loc Loss: 0.9640 [epoch 100][15/1593], || Loss: 4.5165 || Conf Loss: 3.3467 || Loc Loss: 1.1698

DerrickXuNu commented 2 years ago

Let me answer your training question first, and I will response to your first later. V2VNet is not easy to train, it takes much more epochs than others, please make sure you have large batch size, and the learning rate needs to be small after certain epochs. If you are training the pre-trained model, make sure the learning rate starts with a very small one. By setting proj_first to true, I am 100% sure it can converge as I did it. Leave your email here, let me try to find the training log of it and send to you later(I am not 100% sure if I still keeps it though).

DerrickXuNu commented 2 years ago

To your first question, the version of setting proj_first to False is not released yet, so it won't work for now. We plan to release it after some of our other papers are published. Please stay tuned on the updates of this repo.

yifanlu0227 commented 2 years ago

Thanks!

DerrickXuNu commented 2 years ago

Hi if you want, I can try to find the training log of V2VNet and sent to you. What's the best email to reach to you?

yifanlu0227 commented 2 years ago

yifan_lu@sjtu.edu.cn. Thanks again!

DerrickXuNu / OpenCOOD

Can I set proj_first = False when training v2vnet? #13