CalayZhou / MBNet

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems (ECCV 2020)
103 stars 30 forks source link

Question about Implementation Details #10

Open echoofluoc opened 4 years ago

echoofluoc commented 4 years ago

Thanks for your work and kindly sharing your code!

I'd like to ask some question about Implementation details:

  1. from the data/cache/kaist_train_data.npy, it seems that you used 8963 pairs of images of KAIST dataset during training process, and you used the average annotations between visible and lwir modality produced by AR-CNN. I notice the AR-CNN used 8892 pairs of images which has a huge overlap with your choices. What's the protocol you use to separate the training data?
  2. I find the double_resnet.hdf5 you provided is not the one pre-trained on ImageNet, did you use another dataset to get this resnet backbone weights?
CalayZhou commented 4 years ago

Thanks for your interest in our work! I I reply the above questions one by one as follows:

  1. I have done several experiments on KAIST dataset with different annotation settings. To deal with deal with other classes than 'person'? Such as 'cyclist' or 'people', I found that "Integrate all the classes into 'person' and perform single-class detection" performs best. I filter the training images which has no pedestrians, besides, the small pedestrians samples are retained. I recommand using as many annotations as possible and some extra data augments also help. For convenience,I turn the original vbb annotation to the .npy in our code.

  2. Because there are two modalities (RGB and thermal) in multispectral pedestrian detection, we initial the RGB and thermal network branch with the same model pre-trained on ImageNet. The double_resnet.hdf5 clones the original ResNet50 model pre-trained on ImageNet into two copies to initial the RGB and thermal branches independently. Each copy has the same parameters.

echoofluoc commented 4 years ago

Thanks for your reply!

For the second question, please check the double_resnet.hdf5 you provided again, I've opened this hdf5 file to see their parameteres, you said you just copy the weight pertrained on ImageNet from RGB branch to fill in the thermal branch, but the parameteres of the same layer name(for example, res5a_branch1 & res5a_branch1_lwir) are different and they do aren't the weight pertrained on ImageNet. Besides, there're some other layer's weight(for example, P3/P6/pred0_1_conv/pred0_1_rpn_class/ etc. ) exist in it.

so I think maybe you upload the wrong double_resnet.hdf5 file?

CalayZhou commented 4 years ago

Yes, you are right and very careful, i clarify the training procedure and how the double_resnet.hdf5 come from. First i initial the model with the ResNet50 pretrained on the ImageNet, the model is trained for just one iter to copy the parameteres of RGB branch to the thermal branch. Then the model is saved as double_resnet.hdf5 to be conveniently used for the later training process. So you can find the parameteres of the same layer name(for example, res5a_branch1 & res5a_branch1_lwir) are different with the weight pertrained on ImageNet. They have a slight difference because of the one iter training. Also, there're some other layer's weight because the double_resnet.hdf5 is named according to the MBNet's naming rules.

echoofluoc commented 4 years ago

Thanks for your reply!

That's quite weird of what you've done, dosen't it bring much instabilities to get the backbone weights since you just trained (maybe trained on KAIST?)for 1 iter? The optimizer of network can incur quite different parameter in 1 iter update procedure

CalayZhou commented 4 years ago

Yes, it is trained on the KAIST dataset and i set the learning rate very small (1e-4 -> 1e-10), so i think the parameter will not change too much and it is optimized according to the KAIST task. I admit it is not a preferred choice from the present point of view.