[model performance] apply quantization

dmytro-omelian / ideation

(TBD)

0 stars 0 forks source link

Open dmytro-omelian opened 8 months ago

dmytro-omelian commented 8 months ago

Reduces the precision of the weights from float32 to int8, which can decrease model size and increase inference speed with a slight cost to accuracy.