VisualComputingInstitute / triplet-reid

Code for reproducing the results of our "In Defense of the Triplet Loss for Person Re-Identification" paper.
https://arxiv.org/abs/1703.07737
MIT License
764 stars 215 forks source link

Questions on using Inception_ResNet_v1 and test accuracy #8

Open cptay opened 7 years ago

cptay commented 7 years ago

Hi, I am new to deep learning, and thus may not understand your paper fully, hope is all right with you. I tried to implement the batch_hard using Inception_resnet_v1 and trained from scratch using market1501 dataset. The rank 1 cmc is only about 70%. I did not implement re-ranking and augmented test. Do you think this model is able to get rank 1 cmc above 80%?

The second problem I faced was that the test results fluctuate a lot. The rank 1 value can range from 60% to 70%. Can you shade some lights on the test strategy? or point me to papers online? I am using 100 identities to verify the trained model

Thanks!

lucasb-eyer commented 7 years ago

Hi and welcome to deep learning :smile:

Neither of me or @Pandoro have ever used Inception-Resnet, so we can't really know, but I'd expect it to perform similarly to ResNet, so you should definitely be able to get it around or above 80% IMO.

Since you are new, contrary to what many papers want you to believe, the most important thing of all to tune is the learning-rate; it is possible that you'll need a different learning-rate from us because you're using a different type of model.

The next question is why are you only using 100 identities to verify the model? The Market1501 dataset comes with many, many more test identities and a standard split. Specifically, the rank-1 value is not comparable across differently-sized test-sets! Rank-1 is easier as the gallery gets smaller. I highly recommend you use the standard split and the standard evaluation code, so that you can actually compare your results to papers' results. Until you get similar performance to current papers, then you can switch to training on mostly everything for actual deployment (if you don't want to report results in papers).

Unless when you say "test" you actually mean "validate"? Because for validation, it is correct to split off a small part of the training set, and use that only to find what works best, but not to compare to published results.

Finally, our validation results also fluctuated quite a bit (although not as much as you report), this is usually dealt with using a learning-rate decay schedule. It is typical that when the learning-rate is starting to decay, the scores start to "settle" at the higher end of the fluctuations, and the more it decays, the more stable it becomes. We barely had any fluctuation (less than 1%) once we did the learning-rate decaying.

Hope these answers help you understand things better!

cptay commented 7 years ago

Thanks so much for your prompt reply! I will look into the standard split method and try it out. You advise on the proper use of learning rate will be very valuable to my test.

BTW, I also tried your trinet test. I am using tensorflow slim, and I downloaded the pretrained weights, strip the top layer of the resnet50 model, and added two fully connected layers, and with normalization in between the two layers. I freezed the resnet50 layers, and trained only on the fully connected layers. The best result I could obtain was only about 40% rank 1 CMC, and further training resulted in reduced CMC score. Did you train part of the resnet50 layers also? or there are other methods involved? I feel like pulling off my hairs now... lol

Thanks

On Thu, Sep 21, 2017 at 6:25 PM, Lucas Beyer notifications@github.com wrote:

Hi and welcome to deep learning 😄

Neither of me or @Pandoro https://github.com/pandoro have ever used Inception-Resnet, so we can't really know, but I'd expect it to perform similarly to ResNet, so you should definitely be able to get it around or above 80% IMO.

Since you are new, contrary to what many papers want you to believe, the most important thing of all to tune is the learning-rate; it is possible that you'll need a different learning-rate from us because you're using a different type of model.

The next question is why are you only using 100 identities to verify the model? The Market1501 dataset comes with many, many more test identities and a standard split. Specifically, the rank-1 value is not comparable across differently-sized test-sets! Rank-1 is easier as the gallery gets smaller. I highly recommend you use the standard split and the standard evaluation code https://github.com/zhunzhong07/IDE-baseline-Market-1501/tree/master/market_evaluation, so that you can actually compare your results to papers' results. Until you get similar performance to current papers, then you can switch to training on mostly everything for actual deployment (if you don't want to report results in papers).

Unless when you say "test" you actually mean "validate"? Because for validation, it is correct to split off a small part of the training set, and use that only to find what works best, but not to compare to published results.

Finally, our validation results also fluctuated quite a bit (although not as much as you report), this is usually dealt with using a learning-rate decay schedule. It is typical that when the learning-rate is starting to decay, the scores start to "settle" at the higher end of the fluctuations, and the more it decays, the more stable it becomes. We barely had any fluctuation (less than 1%) once we did the learning-rate decaying.

Hope these answers help you understand things better!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/8#issuecomment-331116730, or mute the thread https://github.com/notifications/unsubscribe-auth/AbdWmKiwmpaJIsuOOw98mPzXdvyR8TIUks5skjmJgaJpZM4Pe2_7 .

Pandoro commented 7 years ago

Hi there!

Sorry for my late reply, I just arrived back from travels abroad and didn't have time to check my mails before.

Generally speaking, you always tune the complete network unless it is specifically mentioned that only a part of the network is tuned. So as usual, when starting from a pretrained network, we also always tuned all the parameters of the network and not just the last additional layers. I hope you also already tried this at some point and didn't pull your hairs off. ;)

cptay commented 7 years ago

Hi,

How did the trip go? Fun? :)

I am testing the system now. Hopefully everything goes well. And yeah, my hairs survived my brutal attack, lucky me...

I still have doubts, but will try to solve them myself first. Hopefully you can release the training codes soon!

Many thanks!

On Thu, Oct 5, 2017 at 5:31 PM, Alexander Hermans notifications@github.com wrote:

Hi there!

Sorry for my late reply, I just arrived back from travels abroad and didn't have time to check my mails before.

Generally speaking, you always tune the complete network unless it is specifically mentioned that only a part of the network is tuned. So as usual, when starting from a pretrained network, we also always tuned all the parameters of the network and not just the last additional layers. I hope you also already tried this at some point and didn't pull your hairs off. ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/8#issuecomment-334411956, or mute the thread https://github.com/notifications/unsubscribe-auth/AbdWmI1fxJhQrRKTiaavE57NKyPEk11Jks5spKIHgaJpZM4Pe2_7 .