Closed sbharadwajj closed 5 years ago
Hi Nikolaos, Thank you for acknowledging.
As you suggested I trained it using Binary cross entropy and first froze the weights of the primary network and trained only the attention module.
Hi Shrisha,
I do not remember how long it took to be honest. 40 epochs for only the attention mechanism sounds not too much especially if you're using a scheduler for the learning rate but again I'm not sure. Let me run some experiments over the weekend (I'll use the ResNet-50 backbone as you do) and I'll get back to you early next week.
For the ResNet-50, I have plugged the attention modules at the 3rd (1024 features) and 4th (2048 features) layer. Oh, and I have used Pytorch.
Sure. That would be great!
@chichilicious Apologies for not getting back to you earlier I have not forgotten you. I had to work on some things that showed up this past week. I will hopefully get back to you with numbers on Tuesday.
@nsarafianos Thank you for remembering and yes, I understand. I shall wait until next week.
Hi Shrisha,
I found some time and ran experiments with the PETA dataset.
That's all. I hope it's helpful.
Hi Nikolaos,
Thank you so much for taking your time off and running the experiments.
Thank you again for the tabulated results.
Hi Shrisha,
In the paper we used Adam everywhere for the PETA dataset whereas in those yesterday I switched to Nesterov SGD because it was the first one I found. For the results above I did not try to find the best optimizer/hyperparameters etc so your results might differ a little.
As for Nesterov Accelerated Gradient, it's slightly different than the original SGD in terms of how the updates are performed. There's an abundance of literature that explains it better than I will (for example here and here)
Ah, alright. Then I will experiment with both the optimizers and observe the results. Thank you, I will go through the slides as well.
Thanks for all the guidance Nikolaos, it helped me a lot! :100:
@chichilicious @nsarafianos sorry to bother, i find the network in this repo is not same as the paper said. Can I ask if your experimental results correspond to the network in the paper or the network of this repo?
Hi @valencebond,
This repo, as well as the paper, have a ResNet-101 backbone with attention modules plugged in at different levels of the backbone. In this thread, we were talking about a ResNet-50 since that's what the initial question was about :)
Let me know if you have any questions and I would be more than happy to help.
thanks for your reply @nsarafianos , in this repo, attention mask is C512-C512-Cn which kernel is 3, but in paper, attention mask is C256-C256-C which kernel is 1. there is also different in subnetwork C256-C512-C1024 compared to paper settings C256-C512-C512. So i am confused experimental result settings you discussed。
Oh my bad then. I'm out of office at this moment but I will check tomorrow and get back to you with an updated response.
@nsarafianos thank you so much, looking forward to your reply
Hi @valencebond,
I just checked and yes you're right. The results in the paper were obtained with what is reported on the supplementary material (and not with what's here). The differences (# of nodes in the layer, and the kernel size) should have a minimal impact on the final performance so I expect small differences. If your results are more than 1-1.5% off then please let me know and I can run experiments again to double check.
In any case, please keep me posted :)
@nsarafianos thanks for your clear explanation again。but the critical point is,in my experiments results which follow paper settings, when i use a pre-trained on ImageNet ResNet-50 and trained the backbone (50 epoch with SGD,lr 0.001),i can get results below without data augmentation. so i am so confused about my results. Without a same baseline, i can't verify the effect of following class weight or attention module。And there is no reimplement results in this repo,i also can't find the code error in my reimplement.
@chichilicious can you share your code or experiment result in PETA? thanks a lot
Hi @valencebond ,
Given that this is a ResNet-50 and without any augmentation I would say that your results look reasonable from here. Moving to 101 or 152 will improve your performance by 2-3%.
Also if I remember correctly, since these datasets are not that big, data augmentation (resize to x1.25 and then grab random crops + horizontal flipping) is almost essential to improve the performance. Please take a look also at this paper for some data augmentation details.
The only thing I would change both in this paper and in our paper is the input dimensions (no need to be square, it can easily be 128x256) and second, you can pre-train on a re-id task which will provide a better initialization compared to ImageNet.
Follow-up Q since I might have missed it: Are these results you posted above on WIDER or PETA?
@valencebond can you email me at shrishabharadwaj5@gmail.com, I can help you out with the code. I achieved 86 as f1 and I think the mAP was around 83. I am not very sure about the mAP score.
Hi,
I have a few doubts regarding PETA's training.