Potential Speedup Strategies for Detector, B_C4_perm_descriptor_Setting_c, and maxmatchesmatcher

georg-bn / rotation-steerers

A steerer for D-dimensional keypoint descriptions is a DxD matrix that transforms the descriptions as if they were computed from a rotated image.

MIT License

64 stars 8 forks source link

Potential Speedup Strategies for Detector, B_C4_perm_descriptor_Setting_c, and maxmatchesmatcher #7

Closed qidihan closed 3 months ago

qidihan commented 3 months ago

Hi,

Thank you for sharing this exceptional paper! I've conducted some practical tests and discovered that the detector takes around 0.02 seconds, B_C4_perm_descriptor_Setting_c takes around 0.11 seconds, and maxmatchesmatcher takes around 0.11 seconds. （I used 3060 12GB GPU) I was wondering if there are any potential methods to accelerate these components?

Thanks, Qidi

georg-bn commented 3 months ago

The main ways to improve runtime are to decrease the resolution and lower the number of keypoints. I'm surprised that the detector is that much slower than the descriptor, though. On my 2080Ti, the detector takes 0.06 seconds, and the descriptor 0.05 seconds, with a resolution of 784x784. The key to making the matching faster is to use the SubsetMatcher or MaxSimilarityMatcher instead of the MaxMatchesMatcher.

qidihan commented 3 months ago

Thanks for you responding!! I am experiencing an issue where the descriptor for the first image takes approximately 0.02 seconds, but the descriptor for the second image requires 0.11 seconds. I don't understand why the normalization step takes so much time on the second image. Do you have any suggestions to address this issue? My images have a resolution of 640x360 with matching point 500.

Additionally, I attempted to use ONNX to accelerate the process but encountered difficulties (https://github.com/fabio-sim/DeDoDe-ONNX-TensorRT/issues/8). I'm currently uncertain about how to integrate the steerers and matcher into a single ONNX model. If you have any advice or solutions, I would greatly appreciate your help.

Thanks, Qidi

georg-bn commented 3 months ago

I don't know much about ONNX, unfortunately. For the descriptor timing, I'm unsure, do you have a code snippet that you use for timing?

As for getting incorrect results in the DeDoDe-ONNX-TensorRT issue, I believe it is due to a mismatch between descriptor and steerer, i.e. you should use the setting C steerer_generator here:

descriptor = dedode_descriptor_B(weights=torch.load("model_weights/B_C4_Perm_descriptor_setting_C.pth"))
steerer_generator = torch.load("model_weights/B_C4_steerer_setting_A.pth")

qidihan commented 3 months ago

12071719919398_ pic_hd This is the code for time consumption. time1 is the detection time and the time2 is desceiptor time, time3 is the matcher.match funtion time.

When I use setting C in onnx export, It still got incorrect answer.

descriptor = dedode_descriptor_B(weights=torch.load("model_weights/B_C4_Perm_descriptor_setting_C.pth"))  
steerer_generator = torch.load("model_weights/B_C4_Perm_steerer_setting_C.pth")

44081719912021_ pic_hd The result is totally wrong.

qidihan commented 3 months ago

I found the answer, but I don't know the result. I change the detector encoder = VGG(size='19') to encoder = VGG19() in following code. Do you know the reason of this? I saw that VGG(size='19') is equal to VGG19()

 def DeDoDeDetectorL(weights=None, num_keypoints=10000):
    NUM_PROTOTYPES = 1
    residual = True
    hidden_blocks = 8
    ......
    encoder = VGG19()
    decoder = Decoder(conv_refiner)
    model = DeDoDeDetector(
        encoder=encoder, decoder=decoder, num_keypoints=num_keypoints
    )
    model.load_state_dict(
        weights
        if weights is not None
        else torch.hub.load_state_dict_from_url(MODEL_URLS["dedode_detector_L"])
    )
    return model

georg-bn commented 3 months ago

So you got the TensorRT running? I don't know why that change would do anything to be honest. It seems like in the DeDoDe-ONNX-TensorRT-repo VGG19 is not included?

qidihan commented 3 months ago

Not yet, just finished to export the onnx file, I’m not expert on TensorRT. Try to learn it recently. I can upload the onnx export file script if you need it. Just not sure if this is correct.