flyinghu123 / CPR

This is an official implementation of the paper : "Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval" (Accepted by IEEE TIP)
https://arxiv.org/abs/2308.06748v1
MIT License
89 stars 7 forks source link

Clarification on Inference Speed and k-NN Search Process #13

Closed Apostatee closed 2 months ago

Apostatee commented 2 months ago

Hi, First of all, thank you for the great work on this project!

I have a question regarding the inference speed. Specifically, I want to clarify whether the reported inference time includes the time taken for the k-NN search.

From my understanding, during the inference stage, the process involves:

  1. Extracting the features from the test image
  2. Collecting the features of k nearest neighbors according to the retrieval indexs as the template
  3. Comparing the test features with the template features to make the final prediction.

However, the retrieval indexes computation could be time-consuming and not included in the inference time?

flyinghu123 commented 2 months ago

Yes, the reported inference time does include the time taken for the k-NN search. The reported inference time covers the entire inference process, excluding the time taken to load images, transfer data from CPU to GPU, and transfer data from GPU back to CPU.

Apostatee commented 2 months ago

Thank you for your reply, meaningful work for real application!

Apostatee commented 2 months ago

I successfully trained a model on custom data and achieved satisfactory results. Afterward, I converted the model and the "unfold post-process" module to ONNX format. However, during ONNX inference—particularly during the "unfold post-process" operation—I am encountering significantly long execution times (around 830 ms).

Could you please share the tensorrt demo script that can be used to test inference time? It seems the "unfold" operation is very slow in ONNX inference.

flyinghu123 commented 2 months ago

To troubleshoot the performance issue, you should first confirm the performance of the PyTorch model. On a 4090 GPU, you should expect around 113 FPS. Even if you include other overhead, the performance should still be at least 50 FPS, which means a single "unfold" operation should not take 800 ms. For optimizing the inference time using TensorRT, you can refer to Torch-TensorRT

Apostatee commented 2 months ago

Thank you very much. This is not wrong related to the paper. The PyTorch version shows good speed and conforms exactly to the paper. However, in ONNX deployment, there’s a problem during inference with the converted ONNX model. I suspect that the ONNX model might be using a for-loop to handle the unfold operation, which could be causing the issue.