Llama4 FP8 Training Debug - fairscale

What does this PR do?

Fixes # (issue).

Before submitting

[ ] Did you have fun?
- Make sure you had fun coding 🙃
[ ] Did you read the contributor guideline?
[ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] N/A
[ ] Did you make sure to update the docs?
- [ ] N/A
[ ] Did you write any new necessary tests?
- [ ] N/A
[ ] Did you update the changelog? (if needed)
- [ ] N/A

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

facebookresearch / fairscale

Llama4 FP8 Training Debug - fairscale #1183

What does this PR do?

Before submitting

PR review