dmytro-omelian / ideation

(TBD)
0 stars 0 forks source link

[model performance] apply quantization #15

Open dmytro-omelian opened 8 months ago

dmytro-omelian commented 8 months ago

Reduces the precision of the weights from float32 to int8, which can decrease model size and increase inference speed with a slight cost to accuracy.