NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.
MIT License
311 stars 29 forks source link

Cformers

SoTA Transformers with C-backend for fast inference on your CPU.

Introduction

We identify three pillers to enable fast inference of SoTA AI models on your CPU:

  1. Fast C/C++ LLM inference kernels for CPU.
  2. Machine Learning Research & Exploration front - Compression through quantization, sparsification, training on more data, collecting data and training instruction & chat models.
  3. Easy to use API for fast AI inference in dynamically typed language like Python.

This project aims to address the third using LLaMa.cpp and GGML.

Guiding Principles

And most importantly:

Usage

Setup

pip install transformers wget
git clone https://github.com/nolanoOrg/cformers.git
cd cformers/cformers/cpp && make && cd ..

Usage:

from interface import AutoInference as AI
ai = AI('EleutherAI/gpt-j-6B')
x = ai.generate('def parse_html(html_doc):', num_tokens_to_generate=500)
print(x['token_str'])

OR

from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate=100)
print(x['token_str'])

OR

python chat.py

chat.py accepts the following parameteres:

We are working on adding support for pip install cformers.

Following Architectures are supported:

Currently following huggingface models are supported:

We need to quantize and upload remaining models based on the supported architectures on huggingface. We would appreciate your help in this regard.

Coming Soon:

Features:

Code-base restructuring:

Models

For now, we are focussing on AutoRegressive-style generative models.

Quantization types:

Contributions

We encourage contributions from the community.

Providing feedback:

Easy first issues:

Following are some easy first issues ways in which you can help improve CTransformers:

Issues on Machine Learning side (some are exploratory):

Non-Python side

If you are allergic to Python, you can:

You can also contribute to LLaMa.cpp and we will port those niceties here.

Misc. Notes

Our interface is still limited to generation. We are working to support other features:

We would love to hear from you various ways in which we can speed up and improve the interface.

License

MIT License

Communication and Support

Discord: https://discord.gg/HGujTPQtR6