NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.
MIT License
311 stars 29 forks source link

Any plans to update models and their quantizations? #44

Open Calandiel opened 1 year ago

Calandiel commented 1 year ago

ggml has support for Q1_O quantization now which was reported to offer better inference quality for some of the models at a cost of slower execution. At the same time, Open Assistant released newer weights for the pythia based model than the ones that are currently being pulled. Perhaps it'd be worth updating the model on hugginface using the new quantization method? I would make a PR with it myself but I don't have access to a GPU with enough RAM to quantize the 12B model.