fabio-sim / LightGlue-ONNX

ONNX-compatible LightGlue: Local Feature Matching at Light Speed. Supports TensorRT, OpenVINO
Apache License 2.0
376 stars 34 forks source link

The output shape of lightglue's onnx model is dynamic. Does tensorrt support dynamic output? #59

Closed weihaoysgs closed 11 months ago

weihaoysgs commented 11 months ago

@fabio-sim Hi, I noticed that the settings for the output shape in the new script file you uploaded for using tensnsorrt to infer lightglue are fixed values.

# TODO: Still haven't figured out dynamic output shapes yet:
if binding == "matches0":
    shape = (512, 2)
elif binding == "mscores0":
    shape = (512,)

However, the onnx model seems to be set dynamically when exported, so I have the following two questions to ask you.

  1. What if the number of matching points obtained exceeds 512? But it seems that the maximum number of feature points configured in the model you are using is 512, so it will not exceed it. In other words, try to set the output dimension as large as possible?

  2. Since the fixed value is set during inference, can it be directly set to a fixed shape when exporting the onnx model? How to operate?

Because I encountered the following problems when using tensorrt c++ excuse for reasoning, I guess it may be caused by the above reasons.

[12/18/2023-23:29:12] [I] [TRT] Loaded engine size: 52 MiB
[12/18/2023-23:29:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +43, now: CPU 0, GPU 43 (MiB)
[12/18/2023-23:29:12] [I] [TRT] Loaded engine size: 3 MiB
[12/18/2023-23:29:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +2, now: CPU 0, GPU 45 (MiB)
[12/18/2023-23:29:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +582, now: CPU 0, GPU 627 (MiB)
[12/18/2023-23:29:12] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +582, now: CPU 0, GPU 627 (MiB)
[12/18/2023-23:29:13] [I] [TRT] [MS] Running engine with multi stream info
[12/18/2023-23:29:13] [I] [TRT] [MS] Number of aux streams is 2
[12/18/2023-23:29:13] [I] [TRT] [MS] Number of total worker streams is 3
[12/18/2023-23:29:13] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[12/18/2023-23:29:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1137, now: CPU 0, GPU 1182 (MiB)
[12/18/2023-23:29:13] [E] [TRT] 1: [runner.cpp::executeMyelinGraph::715] Error Code 1: Myelin ([myelinGraphExecute] Called without resolved dynamic shapes.)

Looking forward to your reply!

weihaoysgs commented 11 months ago

I tried changing the export.py script as follows

torch.onnx.export(
            lightglue,
            (kpts0, kpts1, desc0, desc1),
            lightglue_path,
            input_names=["kpts0", "kpts1", "desc0", "desc1"],
            output_names=["matches0", "mscores0"],
            opset_version=17,
            dynamic_axes={
                "kpts0": {1: "num_keypoints0"},
                "kpts1": {1: "num_keypoints1"},
                "desc0": {1: "num_keypoints0"},
                "desc1": {1: "num_keypoints1"},
                # "matches0": {0: "num_matches0"},
                # "mscores0": {0: "num_matches0"},
            },
        )

the dynamic axes of "matches0" and "mscores0" have been annotated, but the onnx export model maybe have also dynamic output?

2023-12-19_13-34

fabio-sim commented 11 months ago

Hi @weihaoysgs

I'm no expert at TensorRT, so I'm also still not sure how to make dynamic output shapes work there. However, I suspect that the following error is about a different thing.

[12/18/2023-23:29:13] [E] [TRT] 1: [runner.cpp::executeMyelinGraph::715] Error Code 1: Myelin ([myelinGraphExecute] Called without resolved dynamic shapes.)

At runtime, a shape still needs to be set for the inputs, e.g.,: https://github.com/fabio-sim/LightGlue-ONNX/blob/bcf96b76f9d838bc1240da79442001df97be341f/trt_infer.py#L103-L104

Regarding the ONNX model, regardless of whether dynamic axes were specified during export or not, the output is still dynamic due to the filter_matches() function here: https://github.com/fabio-sim/LightGlue-ONNX/blob/bcf96b76f9d838bc1240da79442001df97be341f/lightglue_onnx/lightglue.py#L204-L228

One way to avoid this and have a computable (shape-dependent, but no longer data-dependent) output shape is to perform this filtering as post-processing outside the model, similar to https://github.com/fabio-sim/LightGlue-ONNX/issues/58.

weihaoysgs commented 11 months ago

@fabio-sim Hi, I have set the input dynamic shape like this

const int keypoints_0_index = mEngine->getBindingIndex(lgConfig.inputTensorNames[0].c_str());
const int keypoints_1_index = mEngine->getBindingIndex(lgConfig.inputTensorNames[1].c_str());
const int descriptors_0_index = mEngine->getBindingIndex(lgConfig.inputTensorNames[2].c_str());
const int descriptors_1_index = mEngine->getBindingIndex(lgConfig.inputTensorNames[3].c_str());

const int output_matcher0_index = mEngine->getBindingIndex(lgConfig.outputTensorNames[0].c_str());
const int output_score0_index = mEngine->getBindingIndex(lgConfig.outputTensorNames[1].c_str());

mContext->setBindingDimensions(keypoints_0_index, nvinfer1::Dims3(1, features0.cols(), 2));
mContext->setBindingDimensions(keypoints_1_index, nvinfer1::Dims3(1, features1.cols(), 2));
mContext->setBindingDimensions(descriptors_0_index, nvinfer1::Dims3(1, features0.cols(), 256));
mContext->setBindingDimensions(descriptors_1_index, nvinfer1::Dims3(1, features1.cols(), 256));

I will conduct more detailed tests, thanks for your reply

weihaoysgs commented 11 months ago

@fabio-sim Hi, Thank you for your suggestion. I still put the post-processing in C++ and did not put the post-processing part into the onnx model. The above error disappeared. I will close this question and if there are new ones, I will open another.