chu-tianxiang / llama-cpp-torch

llama.cpp to PyTorch Converter
Other
26 stars 5 forks source link

Convert llama.cpp to Pytorch

The llama.cpp library is a cornerstone in language modeling with a variety of quantization techniques, but it's largely used within its own ecosystem. This repo's aim is to make these methods more accessible to the PyTorch community.

This repo provides an example for converting GGUF files back into PyTorch state dict, allowing you to run inference purely in PyTorch. Currently supported models:

The code is largely inspired by the original llama.cpp and GPT-Fast.

Getting Started

python setup.py install
python convert.py --input tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --output TinyLlama-Q4_K_M
python generate.py --checkpoint_path TinyLlama-Q4_K_M --interactive --compile

torch.compile will take minutes, you can also run in eager mode without --compile flag.

Todo