Add support for GLM/chatGLM models

NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.

MIT License

311 stars 29 forks source link

Open MarkSchmidty opened 1 year ago

MarkSchmidty commented 1 year ago

chatGLM-6B is an open source model based on GLM fine tuned on over 1 trillion tokens of dialogue and RLHF for chat.

It's quickly becoming one of the most popular local models despite no good fast CPU inference support (yet).

Ayushk4 commented 1 year ago

Are you aware of any differences between GLM's architecture and GPT-NeoX? If not then all we need to do is quantize it.

Also, it's LICENSE seems to have the similar restrictions to LLaMas. Any ideas on what format its int4 quantized version in?