Closed kkontny closed 8 months ago
Rationale: some models support Pytorch native FP16 mode. It is preferable to use it over Implicit mode, especially with decoder models due to much less time spent on conversions.
Also removing merge_qkv(): right now it is implemented in Pytorch connector in better way. This PR has to be merged alongside with https://github.com/AmpereComputingAI/transformers/pull/2
Rationale: some models support Pytorch native FP16 mode. It is preferable to use it over Implicit mode, especially with decoder models due to much less time spent on conversions.
Also removing merge_qkv(): right now it is implemented in Pytorch connector in better way. This PR has to be merged alongside with https://github.com/AmpereComputingAI/transformers/pull/2