Question: inference takes long time

lyp-deeplearning commented 1 month ago

Thank you for providing the source code for this interesting work. However, I have a question regarding the inference time. On my device ( RTX 3090 (24GB)), a single inference takes 2.92 seconds (average of 100 runs), whereas the paper reports that it can achieve about 50 fps. I look forward to your response.

hwjiang1510 commented 1 month ago

The released code is based on tensorflow without using any efficient transformer implementation. The reported number is based on a re-implementation using pytorch glue-factory.

arjunkarpur commented 1 month ago

@lyp-deeplearning one other thing to mention:

The TF models should take advantage of the GPU automatically, but the PyTorch DINOv2 code needs some modifications to dino_extract.py:

Line 38: after this line, add a call self.model.cuda() to send the model to GPU mem
Line 113: replace with: out = self.model.get_intermediate_layers(image.cuda(), n=self.feature_layer)[0] - i.e., send image to GPU mem with a .cuda() call

After this, hopefully all models are run on GPU and you should see some inference latency improvements.

google-research / omniglue

Question: inference takes long time #5