Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
5.94k stars 514 forks source link

Support SparseGPT #109

Open tiendung opened 1 year ago

tiendung commented 1 year ago

Sparse should reduce the size and increase infer speed without hurting perf too much. This repo https://github.com/IST-DASLab/sparsegpt is Apache license and may be useful (I hope)

lantiga commented 1 year ago

Thank you @tiendung

Yes! Very much on the radar, in fact the "sparsification" mention in the README was added initially with sparsegpt in mind.

Let's keep this open and get it done in the next few days. BTW would you be interested in contributing this one?

tiendung commented 1 year ago

I would like to contribute to this wonderful project. pre-training a new model is my most interest right now. I think I'll try to contribute to train.py first, a better sampling strategy for big dataset (100G of text for example)? I need to understand the training code first :)

gupta-abhay commented 1 year ago

Hi,

I have offline support for Llama models and SparseGPT - wondering what is the right way to get started with the integration into this repository ?