Closed Apostatee closed 2 months ago
Yes, the reported inference time does include the time taken for the k-NN search. The reported inference time covers the entire inference process, excluding the time taken to load images, transfer data from CPU to GPU, and transfer data from GPU back to CPU.
Thank you for your reply, meaningful work for real application!
I successfully trained a model on custom data and achieved satisfactory results. Afterward, I converted the model and the "unfold post-process" module to ONNX format. However, during ONNX inference—particularly during the "unfold post-process" operation—I am encountering significantly long execution times (around 830 ms).
Could you please share the tensorrt demo script that can be used to test inference time? It seems the "unfold" operation is very slow in ONNX inference.
To troubleshoot the performance issue, you should first confirm the performance of the PyTorch model. On a 4090 GPU, you should expect around 113 FPS. Even if you include other overhead, the performance should still be at least 50 FPS, which means a single "unfold" operation should not take 800 ms. For optimizing the inference time using TensorRT, you can refer to Torch-TensorRT
Thank you very much. This is not wrong related to the paper. The PyTorch version shows good speed and conforms exactly to the paper. However, in ONNX deployment, there’s a problem during inference with the converted ONNX model. I suspect that the ONNX model might be using a for-loop to handle the unfold operation, which could be causing the issue.
Hi, First of all, thank you for the great work on this project!
I have a question regarding the inference speed. Specifically, I want to clarify whether the reported inference time includes the time taken for the k-NN search.
From my understanding, during the inference stage, the process involves:
However, the retrieval indexes computation could be time-consuming and not included in the inference time?