Closed shenzhu1993 closed 1 month ago
This PR, based on the the Mixtral-8x7B paper and our tests , fixes some bugs that were present in the train_mixtral_8x7b.yaml file and modifies some unnecessary parameters.
This PR, based on the the Mixtral-8x7B paper and our tests , fixes some bugs that were present in the train_mixtral_8x7b.yaml file and modifies some unnecessary parameters.