fp16 vs fp32 Inference speed is the same

PeterL1n / RobustVideoMatting

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

https://peterl1n.github.io/RobustVideoMatting/

GNU General Public License v3.0

8.61k stars 1.14k forks source link

fp16 vs fp32 Inference speed is the same #94

Closed Tetsujinfr closed 3 years ago

Tetsujinfr commented 3 years ago

hi tx for this repo, really cool!

I ave a try to the inference_speed.py on a 3090 and I got almost the same results : 169fps for both fp32 and 168fps @ HD res with downsample 0.25, on mobilenetv3.

Should I not get significantly faster speed on fp16?

Edit: I can see a difference of 15% on resenet50, so I guess that has to deal with the nature of the pre-trained models used.

FengMu1995 commented 2 years ago

@Tetsujinfr rvm_mobilenetv3_fp16.torchscript是一定要在gpu上跑才行吗，为啥用cpu跑这个模型有很多错误，float32就没问题

Tetsujinfr commented 2 years ago

I do not speak chinese. Based on a translation, I am not sure I get your point. I am inferencing on gpu here, not cpu, even if the model is named "mobile", it is just a smaller size network but still executes on GPU no?

FengMu1995 commented 2 years ago

Well, when you inferenced the model with float16， which format was used？.onnx or .torchscript I use the distributed rvm_mobilenetv3_fp16.torchscript on github，the errors is as following， "compute_indices_weights_linear" not implemented for 'Half' "unfolded2d_copy" not implemented for 'Half'