In Figure 6 of the paper, why are the memory usages for 7B, 13B, 33B, 65B equal to 5046M, 8476M, 19302M, 37074M, instead of 3.5 G, 6.5 G, 16.5G, 32.5G?
I understand that there are some memory for the quantization constants, but I think the gap should not be this large, like for 7B, there is 1.5 G memory gap.
In Figure 6 of the paper, why are the memory usages for 7B, 13B, 33B, 65B equal to 5046M, 8476M, 19302M, 37074M, instead of 3.5 G, 6.5 G, 16.5G, 32.5G?
I understand that there are some memory for the quantization constants, but I think the gap should not be this large, like for 7B, there is 1.5 G memory gap.