Quantization bit (or byte) value

genai-impact / ecologits

🌱 EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.

Mozilla Public License 2.0

88 stars 9 forks source link

Hello @Neyri

Thanks for the feedback!

The model_required_memory function computes the estimated size in VRAM when the model is loaded on the GPU. We assume models are quantized with 4 bits or int4 (likely not the case for every provider, but it balances out the lack of support for other optimizations on our side).

We base our calculation on this blog post. You are right that usually models are represented with float32 so 4 bytes (32 bits), and in this case, the memory footprint at inference is approximated with:

$$ 4\ \text{bytes} \times \text{(No. Params)} \times 1.2 = \frac{32\ \text{bits}}{8\ \text{bits}} \times \text{(No. Params)} \times 1.2 $$

So for a model quantized at 4 bits (represented with int4), the memory footprint is approximated with:

$$ \frac{4\ \text{bits}}{8\ \text{bits}} \times \text{(No. Params)} \times 1.2 $$

The result is in Gigabytes (GB) because we count the number of parameters in billions.

genai-impact / ecologits

Quantization bit (or byte) value #84