is the global descriptor of hfnet better than netvlad?

ethz-asl / hfnet

From Coarse to Fine: Robust Hierarchical Localization at Large Scale with HF-Net (https://arxiv.org/abs/1812.03506)

MIT License

779 stars 185 forks source link

is the global descriptor of hfnet better than netvlad? #56

Closed tanjunyao7 closed 3 years ago

tanjunyao7 commented 3 years ago

hi author. thanks for your work. I don't know much about knowledge distillation. As far as I understand, the student network will not outperform its teacher since it contains less parameters. But this student has multiple teachers, i.e. netvlad and superpoint. The overall loss is defined as the weight sum of individual losses. So I wonder if it can be true that the weakness of one teacher network(i.e. netvlad) can be compensated by another teacher(i.e superpoint), resulting in higher robustness and accuracy of global descriptor of hfnet than netvlad.

sarlinpe commented 3 years ago

This is an interesting question. If you look at the ablation study (table 4 in the paper), you can notice that:

NV+HF-Net performs better than NV+SP, so the distillation improves the performance of the local features.
NV+HF-Net performs better than HF-Net, so the distillation yields less robust global descriptors.

We did not aim at improving the performance on both tasks, but I believe that it is possible with better losses and careful balancing.

tanjunyao7 commented 3 years ago

This is an interesting question. If you look at the ablation study (table 4 in the paper), you can notice that:
* NV+HF-Net performs better than NV+SP, so the distillation improves the performance of the local features.

* NV+HF-Net performs better than HF-Net, so the distillation yields less robust global descriptors.
We did not aim at improving the performance on both tasks, but I believe that it is possible with better losses and careful balancing.

ok. In my case I only want the global retrieval part since the localization is done by lidar. I want to improve the global retrieval performance for datasets with lots of dynamic objects. Have you tried training the model using only global supervision from NV without local head? What if I have posed images so that direct supervision, e.g. via Siamse net architecture, is possible?

sarlinpe commented 3 years ago

There is a whole body of literature on learning image retrieval from pose labels only, e.g. the work of Thoma et al.. This is out of the scope of this repository.