Closed tanjunyao7 closed 3 years ago
This is an interesting question. If you look at the ablation study (table 4 in the paper), you can notice that:
We did not aim at improving the performance on both tasks, but I believe that it is possible with better losses and careful balancing.
This is an interesting question. If you look at the ablation study (table 4 in the paper), you can notice that:
* NV+HF-Net performs better than NV+SP, so the distillation improves the performance of the local features. * NV+HF-Net performs better than HF-Net, so the distillation yields less robust global descriptors.
We did not aim at improving the performance on both tasks, but I believe that it is possible with better losses and careful balancing.
ok. In my case I only want the global retrieval part since the localization is done by lidar. I want to improve the global retrieval performance for datasets with lots of dynamic objects. Have you tried training the model using only global supervision from NV without local head? What if I have posed images so that direct supervision, e.g. via Siamse net architecture, is possible?
There is a whole body of literature on learning image retrieval from pose labels only, e.g. the work of Thoma et al.. This is out of the scope of this repository.
hi author. thanks for your work. I don't know much about knowledge distillation. As far as I understand, the student network will not outperform its teacher since it contains less parameters. But this student has multiple teachers, i.e. netvlad and superpoint. The overall loss is defined as the weight sum of individual losses. So I wonder if it can be true that the weakness of one teacher network(i.e. netvlad) can be compensated by another teacher(i.e superpoint), resulting in higher robustness and accuracy of global descriptor of hfnet than netvlad.