> I have test HFNet with TF-TRT on Tensorflow 1.x and Tensorflow 2
When using TensorFlow 1.x, dynamics_op must set to true and it takes long time of initialization. When using Tensorflow 2, we can use data to build the engine and save it.
Here is my benchmark
Platform Accuracy Device Size Inference Time Loading Time
TensorRT@TF1 FP16 TITAN 400x208 8.49 ms ± 151 µs
TensorRT@TF1 FP32 TITAN 400x208 8.29 ms ± 37.3 µs
TensorFlow1.x FP32 TITAN 400x208 12.1 ms ± 1.89 ms
TensorRT @Tensorflow2 FP16/FP32 TITAN 400x208 6.95 ms ± 114 µs / 6.86 ms ± 48.3 µs
TenorRT@TF1 Fp16 TX2 400x208 28.3 ms ± 166 µs(100kps) 29.1 ms ± 883 µs(200kps) 19.35(Graph Loading) + 240.52s
TensorRT@TF1 FP32 TX2 400x208 31.3(100kps) 31.6(200kps)
TensorFlow1.x - TX2 400x208 67.1ms(200kps)
TensorRT@TF2 FP32 TX2 400x208 37.5 ms ± 196 µs per loop -
TensorRT@TF2 FP16 TX2 400x208 34.5 ms ± 140 µs per loop 68.5s(Graph Loading) + 22.08s
TensorFlow2 Fp32 TX2 400x208 100KPS 49.13ms@100KPS 49.26ms@200KPs Test on real data, stable, no more than 100ms case
TensorRT@TF2 FP16 TX2 400x208@100kps 119.43ms Inference time not stable if lack of features; Error may cause by zero vectors.
TensorRT@TF1 FP16 TX2 400x208@100 kps 66ms Dynamics OP=False On real data
TF1.x FP32 TX2 400x208 67.39ms@100KPS 67.39ms@200KPS 67.27ms@200KPS, 0.15mem Real Data
However, the TF-TRT acceleration for HFNet is not stable. I found when there exist image lack of feature points, the TF-TRT will detects zero dim tensors and back to Tensorflow, the procedure is really slow, sometime takes few seconds in practial. You can found on my reaI data it become really slow. I also test to cut the HFNet to Superpoint and NetVLAD (cut is to keep the consistency instead of directly use Superpoint and HFNet).
Tensorflow2 SuperPoint@HFNet FP32 TX2 400x208 38ms@200kps Real Data First Try 4sec
TensorFlow2 NetVLAD@HFNet FP32 TX2 400x208 25.49ms RealData First Try 8.4sec
TensorRT@TensorFlow2 NetVLAD@HFNet FP16 TX2 400x208 12.73ms RealData; First Try 5.57sec
TensorRT@TensorFlow NetVLAD@HFNet FP32 TX2 400x208 12.75ms Real Data; First Try 5.5sec
TensorRT@TensortFlow SuperPoint@HFNet FP16 TX2 400x208 26~27ms Only Fast case; Not pratical now. First Try 27ms. Become very big ~400ms sometimes
Thanks a lot! I'm going to take a try asap.
You also mentioned tf2 can build a engine, is it directly loaded by TFTRT API? If so, I guess maybe HFNet can convert to TRT engine under TF2.0 and other necessary tools. I will also try it.
Thanks a lot! I'm going to take a try asap.
You also mentioned tf2 can build a engine, is it directly loaded by TFTRT API? If so, I guess maybe HFNet can convert to TRT engine under TF2.0 and other necessary tools. I will also try it.
Keep in touch :)
Originally posted by @Irwin-Liu in https://github.com/Irwin-Liu/hfnet-tf2onnx/issues/1#issuecomment-709180357