Closed shenzhu1993 closed 1 month ago
This PR, based on the Megatron-LM repository scripts and the Mixtral-8x7B paper, fixes some bugs that were present in the train_mixtral_8x7b.yaml file and modifies some unnecessary parameters.
This PR, based on the Megatron-LM repository scripts and the Mixtral-8x7B paper, fixes some bugs that were present in the train_mixtral_8x7b.yaml file and modifies some unnecessary parameters.