Closed qidihan closed 3 months ago
The main ways to improve runtime are to decrease the resolution and lower the number of keypoints. I'm surprised that the detector is that much slower than the descriptor, though. On my 2080Ti, the detector takes 0.06 seconds, and the descriptor 0.05 seconds, with a resolution of 784x784. The key to making the matching faster is to use the SubsetMatcher
or MaxSimilarityMatcher
instead of the MaxMatchesMatcher
.
Thanks for you responding!! I am experiencing an issue where the descriptor for the first image takes approximately 0.02 seconds, but the descriptor for the second image requires 0.11 seconds. I don't understand why the normalization step takes so much time on the second image. Do you have any suggestions to address this issue? My images have a resolution of 640x360 with matching point 500.
Additionally, I attempted to use ONNX to accelerate the process but encountered difficulties (https://github.com/fabio-sim/DeDoDe-ONNX-TensorRT/issues/8). I'm currently uncertain about how to integrate the steerers and matcher into a single ONNX model. If you have any advice or solutions, I would greatly appreciate your help.
Thanks, Qidi
I don't know much about ONNX, unfortunately. For the descriptor timing, I'm unsure, do you have a code snippet that you use for timing?
As for getting incorrect results in the DeDoDe-ONNX-TensorRT issue, I believe it is due to a mismatch between descriptor and steerer, i.e. you should use the setting C steerer_generator here:
descriptor = dedode_descriptor_B(weights=torch.load("model_weights/B_C4_Perm_descriptor_setting_C.pth"))
steerer_generator = torch.load("model_weights/B_C4_steerer_setting_A.pth")
This is the code for time consumption. time1 is the detection time and the time2 is desceiptor time, time3 is the matcher.match funtion time.
When I use setting C in onnx export, It still got incorrect answer.
descriptor = dedode_descriptor_B(weights=torch.load("model_weights/B_C4_Perm_descriptor_setting_C.pth"))
steerer_generator = torch.load("model_weights/B_C4_Perm_steerer_setting_C.pth")
The result is totally wrong.
I found the answer, but I don't know the result. I change the detector encoder = VGG(size='19') to encoder = VGG19() in following code. Do you know the reason of this? I saw that VGG(size='19') is equal to VGG19()
def DeDoDeDetectorL(weights=None, num_keypoints=10000):
NUM_PROTOTYPES = 1
residual = True
hidden_blocks = 8
......
encoder = VGG19()
decoder = Decoder(conv_refiner)
model = DeDoDeDetector(
encoder=encoder, decoder=decoder, num_keypoints=num_keypoints
)
model.load_state_dict(
weights
if weights is not None
else torch.hub.load_state_dict_from_url(MODEL_URLS["dedode_detector_L"])
)
return model
So you got the TensorRT running? I don't know why that change would do anything to be honest. It seems like in the DeDoDe-ONNX-TensorRT-repo VGG19 is not included?
Not yet, just finished to export the onnx file, I’m not expert on TensorRT. Try to learn it recently. I can upload the onnx export file script if you need it. Just not sure if this is correct.
Hi,
Thank you for sharing this exceptional paper! I've conducted some practical tests and discovered that the detector takes around 0.02 seconds, B_C4_perm_descriptor_Setting_c takes around 0.11 seconds, and maxmatchesmatcher takes around 0.11 seconds. (I used 3060 12GB GPU) I was wondering if there are any potential methods to accelerate these components?
Thanks, Qidi