inference time for mono3d_yolox_576_768.onnx

s95huang commented 3 months ago

Hello, thanks for this work. I am currently testing the mono3d_yolox_576_768.onnx on a single kitti dataset image However, I have observed that the model inference speed is very slow, about 0.9 second per image. Since the model input is quite large, can I ask if you observe similar behavior?

I have set onnxruntime to GPU mode and set

providers = [("CUDAExecutionProvider", {"cudnn_conv_use_max_workspace": '0', 'device_id': '0'})]
print(ort.get_device())

# run inference
start_time = time.time()
output = ort_session.run(None, {'image': input_numpy, 'P2': P_numpy})
end_time = time.time()
print('inference time: ', end_time - start_time)

The output on 3090 is:

2024-03-11 17:50:31.071730836 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-11 17:50:31.071745753 [W:onnxruntime:, session_state.cc:1032 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

GPU
[1, 3, 576, 768]
inference time:  0.9390156269073486
[array([0.39038152, 0.35984343, 0.3046152 , 0.2919413 , 0.26462984,
       0.23632818, 0.21777868], dtype=float32), array([[ 3.49242340e+02,  1.11306046e+02,  4.07652374e+02,
         1.65821838e+02, -7.44742060e+00, -1.13437545e+00,
         2.28757629e+01,  1.49374330e+00,  1.38556397e+00,
         3.45798755e+00, -1.59693551e+00, -1.90921402e+00],
       [ 5.60871582e+02,  1.08445183e+02,  5.77906494e+02,
         1.50540161e+02, -1.82073689e+00, -1.90546668e+00,
         3.16481533e+01,  6.32410765e-01,  1.97664857e+00,
         6.53813183e-01,  1.80945432e+00,  1.75394523e+00],
       [ 3.29188354e+02,  1.12920074e+02,  3.45358734e+02,
         1.25391060e+02, -3.42703705e+01, -6.76556158e+00,
         9.08311920e+01,  1.66698575e+00,  1.54468882e+00,
         3.99142098e+00, -1.65410089e+00, -2.01428485e+00],
       [ 0.00000000e+00,  1.20772865e+02,  3.08910408e+01,
         1.45976822e+02, -3.52112808e+01, -2.28625989e+00,
         4.19187546e+01,  1.71018600e+00,  1.48972130e+00,
         4.22642040e+00,  1.28764760e+00,  5.89864612e-01],
       [ 5.98880249e+02,  1.08602348e+02,  6.14480957e+02,
         1.44978394e+02, -2.24200889e-01, -2.17929220e+00,
         3.42271881e+01,  6.25427425e-01,  1.76455605e+00,
         6.94419742e-01, -1.76615596e+00, -1.77088988e+00],
       [ 2.60345032e+02,  1.11394417e+02,  2.95034607e+02,
         1.39506851e+02, -1.96635494e+01, -2.85144591e+00,
         4.27154388e+01,  1.59680080e+00,  1.50988996e+00,
         3.92449903e+00, -1.40047324e+00, -1.83068955e+00],
       [ 3.15800598e+02,  1.16769684e+02,  3.27755768e+02,
         1.27540756e+02, -3.60040283e+01, -6.38722181e+00,
         9.01799850e+01,  1.67425752e+00,  1.62000513e+00,
         4.12651396e+00, -1.14637792e+00, -1.52563965e+00]], dtype=float32), array([0, 5, 0, 0, 5, 0, 0], dtype=int64)]

Owen-Liuyuxuan commented 3 months ago

The first runtime should be slow. It will go through some warming up and memory preallocation. So if you would better like to check the speed for continuous inference, please test the run-time of the second run (in the same python session).

s95huang commented 3 months ago

Thank you for your reponse~ Indeed, the second inference becomes much faster, as shown below:

inference time:  1.2322235107421875
inference time:  0.01637864112854004

Closing this issue

Owen-Liuyuxuan / ros2_vision_inference

inference time for mono3d_yolox_576_768.onnx #2