Closed sohamparikh closed 13 hours ago
Add max and min values for initialized weights for better stability during training. Inspired by 4.2.2 in the OLMoE paper.
Select all that apply:
List the key changes introduced in this PR:
Make sure the following tasks are completed before submitting the PR:
If there is any impact on performance, describe it and provide benchmark results, if applicable:
Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.
โจ Description
Add max and min values for initialized weights for better stability during training. Inspired by 4.2.2 in the OLMoE paper.
๐ Type of change
Select all that apply:
๐ Changes
List the key changes introduced in this PR:
โ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Dependencies and Configuration
Testing
Performance Impact
๐ Performance Impact Details
If there is any impact on performance, describe it and provide benchmark results, if applicable:
๐๏ธ Additional Notes
Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.