scale factor and bit storing calculation

Hi, there are two questions about this paper

scaling factor

When looking at the code, I'm a little bit confused about the scaling factor. using llama2‘s 4096 hidden dimension and block_size=128 for example:

In the code, the salient, non-salient1, non-salient2 in a block(4096x128) are scaled in high_order_residual. And each one has a 4096x1 scaling factor. So the total scaling factor for a 4096x4096 matrix is 3x4096x(4096/128)?

what's the meaning of the storing bit and why the average is calculated like this?

Thanks!

Aaronhuang-778 / BiLLM

scale factor and bit storing calculation #15