cvg / SOLD2

Joint deep network for feature line detection and description
MIT License
541 stars 73 forks source link

Understanding of homography adaptation #76

Open jj12138 opened 1 year ago

jj12138 commented 1 year ago

Hello, thank you for your wonderful ideas! I wonder to know why the homography adaptation algorithm works. Personally, I think the reason may be that the types of point-line features contained in the synthetic shapes dataset are mainly T-shaped, L-shaped and other links, while the corners of rooms and furniture in the real image are mostly of this shape, so the features contained in the synthetic shapes are a subset of the real world, and the models trained on it can also detect some features of the real world. But how to explain it be more specific? Or are there any papers about it? Thanks again.

rpautrat commented 1 year ago

Hi, yes, what you explained is the basic concept of the synthetic training and generalization to real data. Even though the network trained on synthetic data learns to detect simple shapes, the model can already generalize fairly well to real data. But since there is always a domain gap (synthetic != real images), we use the homography adaptation to improve the quality of the ground truth. By warping the original image under different homographies, we can see the junctions under a different perspective and increase the chance that the network trained on synthetic data detect them correctly. This process was originally introduced in SuperPoint, in case you want to read more about it.

jj12138 commented 1 year ago

Thanks for your quick reply! But I also want to know why we can use the results predicted by the network itself as a label for training again? I think that homography transformation can't increase the information in the image, and the model can't learn new knowledge from the warped image. Why it works?

rpautrat commented 1 year ago

The model does not learn new knowledge after warping the image, but it "sees" it under a different perspective. Maybe in the original image a corner is distorted and cannot be detected by the network, while after warping, the corner will look differently (for example it will appear similar to the corners that were present in the synthetic dataset), and the network will be able to detect it.