cvcode18 / imbalanced_learning

104 stars 20 forks source link

mAP is only about 0.78 #3

Closed mensaochun closed 5 years ago

mensaochun commented 5 years ago

I have run the training code according to your instructions, and after training about 30 epochs, the mAP is 0.78, and after 70 epochs, the mAP is also about 0.78. It can not be compared with the mAP=0.86 which is reported in the original paper. So May I ask you how to get the performance in the paper?

cvcode18 commented 5 years ago

Hi @mensaochun, I assume you are referring to the WIDER dataset. If that is the case then the first thing I would suggest would be training a ResNet-101 and making sure you can obtain the ~83% mAP. To do that make sure you have a learning rate scheduler, proper data-augmentation as well as the right optimizer (SGD w/ momentum). The supplementary material here can be useful. Once you achieve that then start adding the different bits and pieces proposed in this work and track the improvements of the performance. The mAP should get up to 86% on the test set but feel free to follow up should you face any obstacles.

zx3Leonoardo commented 5 years ago

I make every configuration as what you said in that instruction file you mentioned, however only get a map ~80.5 on the fisrt step of freezing the other part and training ResNet101 only.

nsarafianos commented 5 years ago

Hi @zx3Leonoardo

Thank you for your interest in our work. I wrote in another question some steps that I think can get you to a very good performance given a ResNet (check steps 1-3 here)

The person in that thread managed to actually get better results than what we report on the paper which is what I would also expect using a ResNet-101 on WIDER. If I had to guess it's not learning well (either because of the learning rate or because of the input data). Please take a look at the input size + data augmentation and play a little more with training and then get back to me. If it's still not getting anywhere close to 83% of mAP please give me some more details of what you've tried and I'll get back to you.

zx3Leonoardo commented 5 years ago

thanks a lot for reply @nsarafianos I make it to ~83 when first train the Res101 backbone. And then I fix the backbone and fine tune the attention modules which can achieve ~84.7. Finally, I unfrozen backbone and train the two parts simultaneously. However I got a lower mAP ~82. So I want to make sure that my training order is right. And did you get the same mAP ~84.7 before train the two parts?

nsarafianos commented 5 years ago

Hi @zx3Leonoardo

  1. I'm glad your results are getting better. The 84.7 you get is very close to the 85.0 we report on Table 2 of the paper so you're on the right track :)
  2. The next thing I would suggest you do is to make the attention mechanism a little bit "more complicated". One way to go is to add a second one for example at another stage of block 3 (still working on the same image resolution but after a different layer of the ResNet). In that way you extract "local" information from a different place which can help you learn better features. You can put the two local representations together by concatenation, or by using a multi-layer perceptron (a few FC-layers). An alternative is to explore a tiny bit more sophisticated attention mechanism (such as the harmonious attention - CVPR 2017). I believe that by playing with this attention mechanism can bring you at least 1-1.5% so you will be very close to the results we report on the paper.
  3. Finally when you unfreeze both modules remember to fine-tune everything with a small learning rate (like 2e-6). If I remember correctly the performance improvements will come from the first few epochs of fine-tuning and then the validation loss will start plateauing.

I hope this helps.

zx3Leonoardo commented 5 years ago

Thank you very much. your suggestion helps a lot~ However, I am still a little confused about the training process. I trained the res101 with the configuration you mentioned in that file and achieved map 83.3. And then using the res101 pretrained model as initial model, I changed the bce loss into weighted& focal loss and got map ~83.38. In the fine-tune process I also lowered the lr, which helps nothing. I also tried to train the res101 as well as weighted& focal loss from the beginning, and got map ~83.00. There is no improvement in these 2 ways. So I really want to know the right steps to train the network. Should I train different parts one after one, just like 1.train res101 with bce 2.train res101 with w&f loss which take the result model of (1) as initial model. 3.train network with attention module which take the result model of (2) as initial. or I should take all elements as whole part, for example we call it B. And we train res101 first, and add B the network to fine-tune. I don't know if I make a precise description on my problem. I really appreciate that you are so patient on the project and my problem.

VuNguyen597 commented 5 years ago

Hi Zx3Leonoardo, Can you share your code which reproduce to this paper? Thank you very much!!