Open andreamigliorati opened 7 months ago
Hi!
The code is written in easy format for anyone to see, edit and modify according to the needs.
utils.py
file code in the notebook environment. Then you can run train.py
code. After the model is trained, save it and download the model. If you are locally building it, make sure you adjust the model's hyperparameters such as batch_size
, num_epochs
and others along with other values that fit into your compute so you can train the model. If you want to modify the code such that you can improve the performance, you can do so. Then run train.py
file and you are good to go.
Cheers! 🎉 Good luck training LLaMA 2 on BitNet. If you think the documentation needs improvement or you can improve the code quality, make sure to raise an issue and open a Pull Request.
Hi, thank you very much for your help. I managed to train the model as per your script documentation. I was wondering if there's a way to use your code to train over the togethercomputer/RedPajama-Data-1T-Sample or togethercomputer/RedPajama-Data-1T dataset? I have been trying but I keep getting errors related to the "train"/"test" split. Thank you!
Hi. I wonder how much training percentage of original dataset defined in the script did you used to train BitNet architecture. If you can share the model weights on HF and tag me nirajandhakal and I would love to check your model out. Also about RedPajama dataset, can you show me what kind of errors are you facing? I will test out the script when I am free.
Hi, I'm not sure what you mean with training percentage, I just ran the script exactly as it presents on the repo on the openwebtext-tokenized-small dataset.
About RedPajama, I get an error related to the fact there's no "test" split so the script fails as there's no tokenized_data["test"] entry. What shoud I pass as eval_dataset? Or maybe there's a preprocessing step I have to do beforehand? Thanks in advance
Can you provide an example of how to launch a training instance? how can one choose the llama model size (350M, 750M, .. 7B, etc)? Thanks in advance