Open dmytro-omelian opened 8 months ago
Reduces the precision of the weights from float32 to int8, which can decrease model size and increase inference speed with a slight cost to accuracy.
Reduces the precision of the weights from float32 to int8, which can decrease model size and increase inference speed with a slight cost to accuracy.