feat: New Gen AI pattern - Llama2 Distributed Pre-training on Trn1 with RayTrain and KubeRay Operator

…rain and KubeRay

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted. Consult the CONTRIBUTING guide for submitting pull-requests.

- Adds a new pattern for Llama2 Distributed Pre-training on Trn1 with RayTrain and KubeRay Operator.

Motivation

- To provide a robust solution for distributed pre-training of Llama2 using AWS Trainium instances, leveraging the capabilities of RayTrain and KubeRay Operator for efficient and scalable training workflows.

[x] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
[ ] Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
[ ] Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
[x] Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

[ ] E2E Test successfully complete before merge?

awslabs / data-on-eks