Open eschaffn opened 1 year ago
I'm sorry but I'm also new to quantization and model deployments in general.
Another route to take is to use a bigger model, "sparsify" it, then prune the unused connections to optimize inference time.
Hey there!
Is it possible to do post-training quantitzation with Parseq? I'm looking for ways to speed up inference time. I tried training a parseq-tiny model but lost about 13% absolute val accuracy.
I'm new to quantitization and am unsure about the types of models it benefits or which type of quantitization to use.
Thanks for any suggestions!
Did you speed up inference time ? And did you quantize with post-training quantization ?
I have created the quantization support in a separate fork of this repo which can be checked here https://github.com/VikasOjha666/parseq
By default when the model is trained it will be trained in a quantization-aware training which further helps preserve the accuracy during quantization.
Hey there!
Is it possible to do post-training quantitzation with Parseq? I'm looking for ways to speed up inference time. I tried training a parseq-tiny model but lost about 13% absolute val accuracy.
I'm new to quantitization and am unsure about the types of models it benefits or which type of quantitization to use.
Thanks for any suggestions!