round_to_fixed function

Hi, thanks for the great work!! And I am very interested in this work.

However, I am new to the area of quantization and have some questions about the round_to_fixed function in deepshift.utils Line7-18.

In line15 the torch.floor(input/delta) round the fp32 input to the nearest 16bit interger. In my opinion the clamp function should then be followed to clamp the nearest intergers to range(min_val, max_val), that is changing line15-17 to the following: _rounded = torch.floor(input/delta) rounded = torch.clamp(rounded, min_val, maxval) rounded = rounded*delta

Could you give me some comments about the difference of these two implementations? Thanks!!

GATECH-EIC / ShiftAddNet

round_to_fixed function #5