During training, only one decoder layer(draft model) is trained, right?

SafeAILab / EAGLE

Official Implementation of EAGLE

https://arxiv.org/pdf/2406.16858

Apache License 2.0

622 stars 59 forks source link

During training, only one decoder layer(draft model) is trained, right? #72

Closed CRH400AF-A closed 1 month ago

CRH400AF-A commented 1 month ago

In ./train/main.py, I find that only init model from cnet.py. This file is used to build the draft model. So only the draft model is used when train? If so, how many gpus are needed when training? such as, llama2-chat-7b.

Liyuhui-12 commented 1 month ago

During training, only one decoder layer(draft model) is trained, right?

Yes, the features of the base model have already been extracted in advance.

If so, how many gpus are needed when training? such as, llama2-chat-7b.

For the 7B base model, training can be performed using RTX 3090s. We have not tested it on lower configurations.