Quantized models on CPU - CTransformers

dottxt-ai / outlines

Structured Text Generation

https://dottxt-ai.github.io/outlines/

Apache License 2.0

9.58k stars 493 forks source link

Quantized models on CPU - CTransformers #225

Open harryjulian opened 1 year ago

harryjulian commented 1 year ago

Is there any chance of a future integration with CTransformers or something similar to allow for guided generation using quantized models on CPU? If I were to try and hack away at this, what would be the best approach?

rlouf commented 1 year ago

It is definitely possible if the library exposes the forward pass of the model to get the logits. You just have to implement the same interface as in this file and it should work out of the box.