During test right after training, scales are float32, so scale merging process is in fp32, but in resume-test, scales and scale merging process are both float16, this difference will causes slight acc difference. So we need to set the scales into float16 instantly after training finishes, before merging scales into weights
I have validate this in one case, and the acc between "test right after training" and "resume-test" is exactly the same.
During test right after training, scales are float32, so scale merging process is in fp32, but in resume-test, scales and scale merging process are both float16, this difference will causes slight acc difference. So we need to set the scales into float16 instantly after training finishes, before merging scales into weights
I have validate this in one case, and the acc between "test right after training" and "resume-test" is exactly the same.