ServiceNow / Fast-LLM

Accelerating your LLM training to full speed
https://servicenow.github.io/Fast-LLM/
Other
37 stars 5 forks source link

clamping initialized weights #48

Closed sohamparikh closed 13 hours ago

sohamparikh commented 3 days ago

โœจ Description

Add max and min values for initialized weights for better stability during training. Inspired by 4.2.2 in the OLMoE paper.

๐Ÿ” Type of change

Select all that apply:

๐Ÿ“ Changes

List the key changes introduced in this PR:

  1. Change A
  2. Change B

โœ… Checklist

Make sure the following tasks are completed before submitting the PR:

General

Dependencies and Configuration

Testing

Performance Impact

๐Ÿ“Š Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:


๐Ÿ—’๏ธ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.