Add switch to use Pytorch fp16 mode

AmpereComputingAI / ampere_model_library

AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)

https://hub.docker.com/u/amperecomputingai

Apache License 2.0

21 stars 7 forks source link

Add switch to use Pytorch fp16 mode #224

Closed kkontny closed 8 months ago

kkontny commented 8 months ago

Rationale: some models support Pytorch native FP16 mode. It is preferable to use it over Implicit mode, especially with decoder models due to much less time spent on conversions.

Also removing merge_qkv(): right now it is implemented in Pytorch connector in better way. This PR has to be merged alongside with https://github.com/AmpereComputingAI/transformers/pull/2