MDK8888 / GPTFast

Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.
Apache License 2.0
677 stars 64 forks source link

GPTFast

Accelerate your Hugging Face Transformers 7.6-9x with GPTFast!

Background

GPTFast was originally a set of techniques developed by the PyTorch Team to accelerate the inference speed of Llama-2-7b. This pip package generalizes those techniques to all Hugging Face models.

Demo

GPTFast Inference Time Eager Inference Time

Roadmap

Getting Started

WARNING: The below documentation is now deprecated with version 0.3.0. New docs will be up soon!

Documentation

At its core, this library provides a simple interface to LLM Inference acceleration techniques. All of the following functions can be imported from GPTFast.Core: