Can't reproduce Robotcar results

RuotongWANG commented 2 years ago

Hi, I tried to reproduce your result on Robotcar Seasons V2 test set by submitting to the challenge submission server. I used the released performance-focused model which is pre-trained on MSLS dataset, but I got this incorrect result: And I tried the model pre-trained on Pitts30k, the results are not correct either. Besides, the results on other datasets is normal. Is the model version that I used is wrong? Could you possibly release the model state that achieves the results on Robotcat dataset shown in the paper? Or would you provide the results on test set split by conditions like the Supplementary Table 1? Thank you so much.

Best regards,

Tobias-Fischer commented 2 years ago

Hi,

Could you please let us know the complete process that you used to obtain these results? In particular, how you map the best match to a pose?

Best, Tobias

RuotongWANG commented 2 years ago

I directly used the pose of the best matched reference image as the estimated pose of the query. And I have also evaluated the SuperGlue method with the same procedure and got a normal result: So I think there might be something wrong with the configuration or the model state that I used.

Tobias-Fischer commented 2 years ago

Ok - @StephenHausler - let's sit together at some point to find where the culprit lies.

marialeyvallina commented 2 years ago

Hi, @StephenHausler, @Tobias-Fischer , some days ago I ran the Pittsburgh_WPCA4096 and MSLS_WPCA4096 models for RobotSeasons and obtained the following results with the NetVLAD retrieval: Pittsburgh_WPCA4096: day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> overall 5.9 23.3 73.9 In the paper you report for NetVLAD: 7.0 24.9 76.6

MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> overall 5.0 18.58 67.8

The overall I calculate it by doing the weighted mean of both numbers based on the number of images taken at day and night: overall = ( day 9300 + night 2634 ) / (9300 + 2634)

For the Pittsburgh model the difference with the reported numbers seems reasonable to me (like what would happen between two different trainings), so I think that the model is probably fine and the problem lies in the Patch-NetVLAD feature extraction part. I hope this info helps with the issue.

HeartbreakSurvivor commented 2 years ago

Hi, @StephenHausler, @Tobias-Fischer , some days ago I ran the Pittsburgh_WPCA4096 and MSLS_WPCA4096 models for RobotSeasons and obtained the following results with the NetVLAD retrieval: Pittsburgh_WPCA4096: day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> overall 5.9 23.3 73.9 In the paper you report for NetVLAD: 7.0 24.9 76.6

MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> overall 5.0 18.58 67.8

The overall I calculate it by doing the weighted mean of both numbers based on the number of images taken at day and night: overall = ( day 9300 + night 2634 ) / (9300 + 2634)

For the Pittsburgh model the difference with the reported numbers seems reasonable to me (like what would happen between two different trainings), so I think that the model is probably fine and the problem lies in the Patch-NetVLAD feature extraction part. I hope this info helps with the issue.

Hi, could you please tell me the dataset you ran Pittsburgh_WPCA4096 model is RobotSeasons V1 or V2?

marialeyvallina commented 2 years ago

Hi @HeartbreakSurvivor, I ran RobotSeasons V2

HeartbreakSurvivor commented 2 years ago

Hi @HeartbreakSurvivor, I ran RobotSeasons V2

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

overall = ( day 9300 + night 2634 ) / (9300 + 2634)

I dont't know why, but it doesn't matter.

What I really wonder is that how you get these result? just follow the QuickStart in ReadMe.md file? I alos ran the Pittsburgh_WPCA4096 model on RobotCar Seasons V2 but got wrong result and don't know why. I just run the feature_extract.py, feature_match .py to get the 'PatchNetVLAD_predictions.txt' and just get pose of the best matched database image as estimated pose for each query image. And submit result to benchmark website but got wrong answers. So I hope you could tell me how you obtained your results which seems reasonable, Thanks.

marialeyvallina commented 2 years ago

Hi again @HeartbreakSurvivor

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

Thank you very much for pointing this out, it seems that I indeed mixed the two versions. The overall should be instead calculated as: overall = (day 1443 +night 429)/(1443+429) The distribution is very similar between v1 and v2 so the results do not change much: For Pittsburgh_WPCA4096 day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> 5.8 | 23.1 | 73.2 For MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> 4.8 | 17.9 | 65.3

I use indeed feature_extract.py and feature_match.py and then use the NetVLAD_predictions.txt file (I have not evaluated Patch-NetVLAD yet, only NetVLAD). You have to be careful with the format of the poses, as explained in the dataset readme, but the retrieval itself should be fine.

HeartbreakSurvivor commented 2 years ago

Hi again @HeartbreakSurvivor

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

Thank you very much for pointing this out, it seems that I indeed mixed the two versions. The overall should be instead calculated as: overall = (day 1443 +night 429)/(1443+429) The distribution is very similar between v1 and v2 so the results do not change much: For Pittsburgh_WPCA4096 day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> 5.8 | 23.1 | 73.2 For MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> 4.8 | 17.9 | 65.3

I use indeed feature_extract.py and feature_match.py and then use the NetVLAD_predictions.txt file (I have not evaluated Patch-NetVLAD yet, only NetVLAD). You have to be careful with the format of the poses, as explained in the dataset readme, but the retrieval itself should be fine.

thank you very much for the reply, I will check my code.

HeartbreakSurvivor commented 2 years ago

Hi again @marialeyvallina it seems that I got the same problem with you. I ran the Pittsburgh_WPCA4096 model for RobotSeasons V1 and obtained the following results with the NetVLAD retrieval:	day all	night all
6.3 / 25.4 / 87.6	0.8 / 2.5 / 16.5

which seems reasonable to me. But when I use the PatchNetvlad retrieval, the result seems wrong.	day all	night all
2.1 / 8.3 / 36.7	0.1 / 1.3 / 13.9

I have test Pittsburgh_WPCA4096 on RobotCar Seasons V1 for twice just in case, but got the same result, the result is as follows.

So I agree with your point, the problem maybe lies in PathchNetvlad feature extraction or feature match part. Hi, @Tobias-Fischer, any hints about this issue？Or did you test RobotCar Seasons V1 dataset, if so, could you please provide the test result?

Tobias-Fischer commented 2 years ago

Hi, @StephenHausler and I will be looking at this. However the holiday season is coming up and we're tied with other projects.

We haven't ever checked V1 as far as I remember.

I'm assuming you guys are aware that the lower scores are better for NetVLAD (distances), but higher scores for Patch-NetVLAD (number of inliers)? So it needs an argmax instead of argmin to get the top1 match.

QVPR / Patch-NetVLAD

Can't reproduce Robotcar results #45