Closed RokasEl closed 1 week ago
This implements the schedule free optimizer (https://github.com/facebookresearch/schedule_free) and adds it as an optional dependency.
Initial tests suggest a slight improvement over adamW using 3BPA as a test system.
This PR also fixes a small bug for writing config.yaml files when training a single layer model.
config.yaml
This implements the schedule free optimizer (https://github.com/facebookresearch/schedule_free) and adds it as an optional dependency.
Initial tests suggest a slight improvement over adamW using 3BPA as a test system.
This PR also fixes a small bug for writing
config.yaml
files when training a single layer model.