TencentYoutuResearch / PersonReID-NAFS

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"
https://arxiv.org/abs/2101.03036
Other
61 stars 10 forks source link

question about performance #7

Open baixiao930 opened 3 years ago

baixiao930 commented 3 years ago

I trained the model without any changes, with the default training strategy (set in run.sh), but obtained 54.2% R@1. Would you please tell me how to achieve the reported 61.5% R@1 performance as in paper? Should I change the training strategy or some hyper-parameters? Great thanks if get your help.

wettera commented 3 years ago

I trained the model without any changes, with the default training strategy (set in run.sh), but obtained 54.2% R@1. Would you please tell me how to achieve the reported 61.5% R@1 performance as in paper? Should I change the training strategy or some hyper-parameters? Great thanks if get your help.

Thanks for your attention on our work. Two days ago, I ran the code again using four Tesla P40. I upload the model and training log. Maybe you should using 4 GPUs to run the code. The previous experiments were done with 4 GPUs.

baixiao930 commented 3 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

NovaMind-Z commented 3 years ago

I trained the model without any changes, with the default training strategy (set in run.sh), but obtained 54.2% R@1. Would you please tell me how to achieve the reported 61.5% R@1 performance as in paper? Should I change the training strategy or some hyper-parameters? Great thanks if get your help.

Thanks for your attention on our work. Two days ago, I ran the code again using four Tesla P40. I upload the model and training log. Maybe you should using 4 GPUs to run the code. The previous experiments were done with 4 GPUs.

Hi, Thanks for sharing your brilliant work! I try to reproduce your results using 4 GPUs, and I find that when using multi-GPU training for this code, it sames that we need to add more code in the Loss model. Because if we decrease the batch size for each GPU(batch size=16), the distribution of p and q will be effected. So as you mentioned you have tried 4 GPUs training, I wonder do you add more code to fix this bug? I do appreciate if you can share your ideas.

wettera commented 3 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

Please check the environment. You can first test the our model to see whether it can achieve the same performance as our train log reported. If it does not work, feel free to contact us.

yyll1998 commented 3 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

When I try to run code with 4 GPUs, my code always goes wrong. Where have you changed when you run 4gpus? Can you share it with me? Thank you very much.

NovaMind-Z commented 3 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

Do you achieve the 62% performance now? I use 4 RTX-3090 to train this model and meet the same problem with you.

NovaMind-Z commented 3 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

When I try to run code with 4 GPUs, my code always goes wrong. Where have you changed when you run 4gpus? Can you share it with me? Thank you very much.

Is your Pytorch version >= 1.5.0? I meet the StopIteration problem when using Pytorch1.7.

wettera commented 3 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

When I try to run code with 4 GPUs, my code always goes wrong. Where have you changed when you run 4gpus? Can you share it with me? Thank you very much.

Is your Pytorch version >= 1.5.0? I meet the StopIteration problem when using Pytorch1.7.

Is the environment the same as the following:

Video-AD commented 2 years ago

Thanks for your reply. I ran the code again with 4 gpus, and only a littile improvements are obtained. I compare my training log with the one you uploaded, with the exactly same setting, the model trained by me can only reach 48% R@1 in the first 20 epochs, much lower than in the train.log. I think this results in the final performance inferiority. Are there any other adjustments I can do to improve the performance? Thanks a lot if you can help. train.log.txt

Do you achieve the 62% performance now? I use 4 RTX-3090 to train this model and meet the same problem with you.

Hi, I'm currently experiencing the same problem as you, with only 55% performance reproduced, have you solved the problem? Can you achieve the 62% performance now?