Implement 16-bit quantization of the models

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.92k stars 1.1k forks source link

Implement 16-bit quantization of the models #140

Open nshmyrev opened 4 years ago

nshmyrev commented 4 years ago

So we can use bigger ones on mobile more efficiently

nshmyrev commented 3 years ago

I understood there are no good matrix libraries for ARM for 16 bits. We have to quantize to 8 bits actually and use QNNPACK. Some day, maybe with Pytorch move.

LM and graphs could be quantized to 16 bits, even 10 bits.

silvioprog commented 7 months ago

Hey @nshmyrev, just curiosity. Is there any news about this feature? 🙂

nshmyrev commented 7 months ago

In development branch https://github.com/alphacep/vosk-api/tree/vosk-new we support pytorch models with 8-bit quantization.

silvioprog commented 7 months ago

Wow, this is amazing news! I'll try to run an app from this branch on macOS and Linux.

silvioprog commented 7 months ago

@nshmyrev sorry, another quick question. Do you know when the version (and models) with quantization will be released?

nshmyrev commented 7 months ago

@nshmyrev sorry, another quick question. Do you know when the version (and models) with quantization will be released?