the performance of onxx fp16 seems to be even worse than that of onnx fp32

ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

https://els-rd.github.io/transformer-deploy/

Apache License 2.0

1.64k stars 150 forks source link

the performance of onxx fp16 seems to be even worse than that of onnx fp32 #182

Open lierer007 opened 10 months ago

lierer007 commented 10 months ago

src/transformer_deploy/convert.py

model_path = onnx_model_path if is_fp16 else optim_model_paths[0]

Maybe there is something wrong with this line of code, which causes the experimental results to appear that the performance of onnx fp16 is even worse than that of onnx fp32.