Celebrandil / CudaSift

A CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)
MIT License
860 stars 285 forks source link

Repeatability test #61

Open Celebrandil opened 5 years ago

Celebrandil commented 5 years ago

After seeing the paper "PopSift: a faithful SIFT implementation for real-time applications" in which the authors claim CudaSift to perform exceptionally poor with respect to scale changes, I got a bit worried and had to make some tests to verify the claim, using the benchmark code from the paper "A comparison of affine region detectors". Unfortunately, I didn't manage to replicate the results.

The graphs below show the repeatability and number of correspondences for image pairs in the 'bark' image set. The exact number of correspondences can vary quite a bit depending on what threshold you set, but the repeatability should be relatively stable. Also, note that I didn't upscale the image in this test. I haven't yet tried to benchmark the descriptor. If CudaSift performs worse than e.g. VLFeat then it's much more likely to be due to the descriptor. I don't really know why the results differ, but if someone has I clue, I would be glad to hear.

Repeatability 9-point filters repeat Number of correspondences 9-point filters corresp Features from the first 'bark' image sift1 Features from the last 'bark' image sift6

Celebrandil commented 5 years ago

I made some tests with 17-point filters for the detector, rather than the 9-point filters that I've used so far. It's true that with a 9-point filter you truncate a bit of the Gaussians used for larger scales within an octave, but many years ago I concluded that this shouldn't matter much. With 17-point filters, the time consumption goes up with about 0.2 ms on a 1080 Ti.

The number of features also goes down a bit using the same threshold. It's because filters are normalized so that they sum up to one and with longer filters, the peak decreases a little bit. The fact that the repeatability goes up for a scale difference of 4 for the 9-point filters, I believe is just an artifact due to the low number of features in the overlap between images.

Repeatability 17-point filters repeat8 Number of correspondences 17-point filters corresp8