Closed CRH400AF-A closed 1 month ago
During training, only one decoder layer(draft model) is trained, right?
Yes, the features of the base model have already been extracted in advance.
If so, how many gpus are needed when training? such as, llama2-chat-7b.
For the 7B base model, training can be performed using RTX 3090s. We have not tested it on lower configurations.
In ./train/main.py, I find that only init model from cnet.py. This file is used to build the draft model. So only the draft model is used when train? If so, how many gpus are needed when training? such as, llama2-chat-7b.