Closed bobzhang208 closed 6 months ago
Yes it is theoretically possible but our own preliminary testing showed that it is quite unstable. In they paper they call for a full pretrain. You can try to do small updates to the existing layer with techniques such as LoRA to mitigate the instability. We are currently integrating this package in Llama-Factory which will enable users to do exactly that
Closing for inactivity
Is it possible to only train the router without change the other weight in LLM?