Closed cryoco closed 2 months ago
插眼
You can use the script here for training. The code for the data generation phase needs to be modified. You need to modify the preprocess_function to ensure that the conversation matches the template and the loss_mask is in the correct position. Of course, you also need to pay attention to modifying the template during inference.
You can use the script here for training. The code for the data generation phase needs to be modified. You need to modify the preprocess_function to ensure that the conversation matches the template and the loss_mask is in the correct position. Of course, you also need to pay attention to modifying the template during inference.
Thanks for the reply! Do I need to modify the modeling too? It seems a bit weird if my current model and the additional transformer layer have different structure.
It can run without modifying the structure of the draft model, but it is still unclear whether the consistency of the additional transformer layer with the base model will affect the final performance.
Appreciate the guide on Inferencing on custom models. Is there a guide on how I can train my own eagle heads on a custom auto-regressive model?