dhakalnirajan / LLaMA-BitNet

LLaMA-BitNet is a repository dedicated to empowering users to train their own BitNet models built upon LLaMA 2 model, inspired by the groundbreaking paper 'The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits'.
https://arxiv.org/pdf/2402.17764
MIT License
12 stars 3 forks source link

Usage example #1

Open andreamigliorati opened 7 months ago

andreamigliorati commented 7 months ago

Can you provide an example of how to launch a training instance? how can one choose the llama model size (350M, 750M, .. 7B, etc)? Thanks in advance

dhakalnirajan commented 7 months ago

Hi!

The code is written in easy format for anyone to see, edit and modify according to the needs.

  1. To choose LLaMA model size, you need to know which model is best suited to your needs and capabilities along with the resource you have for loading the base model, training the model and then running inference.
  2. Once you have chosen, you need to get access to LLaMA family models from Meta. It is mentioned in README file on where to follow. Once you have the access to the model, you can download the model weights directly from Meta or use Huggingface version. Note that the details you provided also match to your huggingface credentials. It is better to fill the signup form for LLaMA family model through Huggingface Model Hub.
  3. Once you get access to the model, you can create a virtual environment like Anaconda to get started with if you have necessary compute resource available locally. If not, you can use cloud services like Amazon Sagemaker, Google Colab, or any other virtual environment that provides you with virtual Jupyter Notebook interface.
  4. You need to run utils.py file code in the notebook environment. Then you can run train.py code. After the model is trained, save it and download the model.
  5. Create another Virtual environment instance with a new session where you can start running inference and perform further testing , evaluation on the model.

If you are locally building it, make sure you adjust the model's hyperparameters such as batch_size, num_epochs and others along with other values that fit into your compute so you can train the model. If you want to modify the code such that you can improve the performance, you can do so. Then run train.py file and you are good to go.

Cheers! 🎉 Good luck training LLaMA 2 on BitNet. If you think the documentation needs improvement or you can improve the code quality, make sure to raise an issue and open a Pull Request.

andreamigliorati commented 7 months ago

Hi, thank you very much for your help. I managed to train the model as per your script documentation. I was wondering if there's a way to use your code to train over the togethercomputer/RedPajama-Data-1T-Sample or togethercomputer/RedPajama-Data-1T dataset? I have been trying but I keep getting errors related to the "train"/"test" split. Thank you!

dhakalnirajan commented 6 months ago

Hi. I wonder how much training percentage of original dataset defined in the script did you used to train BitNet architecture. If you can share the model weights on HF and tag me nirajandhakal and I would love to check your model out. Also about RedPajama dataset, can you show me what kind of errors are you facing? I will test out the script when I am free.

andreamigliorati commented 6 months ago

Hi, I'm not sure what you mean with training percentage, I just ran the script exactly as it presents on the repo on the openwebtext-tokenized-small dataset.

About RedPajama, I get an error related to the fact there's no "test" split so the script fails as there's no tokenized_data["test"] entry. What shoud I pass as eval_dataset? Or maybe there's a preprocessing step I have to do beforehand? Thanks in advance