the test result and About how to merge the local feature and the global feature

ZERO6666 commented 3 years ago

Sorry to disturb you.

i test your code about the vehicle key points and orientation, however, the MSE of the stage1 is 1.421 , the MSE of the stag2 is 1.650. which is different form what you describe in the README.md. even the CoarseRegressor result is better than the FineRegressor. why?

besides, i want to know the pipeline of the AAVER in detail, can you upload a docunment,

Pirazh commented 3 years ago

Thanks for your interest in our work. The numbers you report are even better than what we report in the paper, are you using the checkpoints provided or you trained a model on your own? It is also possible that I played around some more with the key-point localizer to enhance the numbers after putting the code online and uploaded weights corresponding to that. Note that these numbers are accurate enough; the fact that manual labels (ground truths) only identify a single pixel as the key-point and all the surrounding pixels as background is not entirely accurate. That is why during the training these points are dilated. So you can assume once your error is getting close to 1 pixel you have what you need. Another point to mention here is that this metric (mean square error of distance of the predictions to ground truths) which has been used through the previous works is not very well descriptive on its own. To measure the performance of your keypoint localizer, it is a good idea to consider other metrics such as the entropy of predicted heatmaps for each visible key-point. Treating each heatmap as a probability distribution, helps you know how noisy or dispersed your heatmaps are. As you can see the differences between heatmaps after the first and second stages, the entropy of the second stage is much smaller than the first one which is very desirable.

The detailed pipeline of AAVER is documented in the paper: http://openaccess.thecvf.com/content_ICCV_2019/papers/Khorramshahi_A_Dual-Path_Model_With_Adaptive_Attention_for_Vehicle_Re-Identification_ICCV_2019_paper.pdf

ZERO6666 commented 3 years ago

Excuse me, first i try to merge the total 20 keypoints local feature with the baseline, but the accuracy is lower than the baseline,Do you know why? the experimental result is normal? the baseline MAP is 0.7382, the baseline with 20 keypoints feature MAP is 0.72362

Pirazh commented 3 years ago

As it is mentioned in the paper not all the 20 key-points are used. Based on the orientation, a subset of them is selected and their respective feature maps are concatenated to the intermediate feature maps after res2 block and to be fed subsequent resnet blocks in the second path.

Pirazh commented 3 years ago

Since there is no more discussion here, I close this issue.

Pirazh / Vehicle_Key_Point_Orientation_Estimation

the test result and About how to merge the local feature and the global feature #11