NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.
MIT License
311 stars 29 forks source link

Keep model in RAM? #35

Open mallorbc opened 1 year ago

mallorbc commented 1 year ago

Is there any way we can keep the model loaded in RAM? Generation is very fast but there is a startup time for every generation. Keeping the model in RAM to then be used when given input would be great.

Ayushk4 commented 1 year ago

This functionality is yet to be implemented. Similar to Issue #36.