Open le1nux opened 4 months ago
During the training of 3.6B and 7B with FSDP we experienced a loss spike after the loss as the model was moving towards convergence.
Things that we should check in our implementation:
Addressed in PR #143
During the training of 3.6B and 7B with FSDP we experienced a loss spike after the loss as the model was moving towards convergence.
Things that we should check in our implementation:
GPT2 implementation (we could train a small model directly from Huggingface for comparison)