aws-samples / awsome-distributed-training

Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
MIT No Attribution
177 stars 74 forks source link

Trainium llama3 #363

Closed syedazi closed 3 months ago

syedazi commented 3 months ago

Description of changes: This is an example for running Llama 3 8B and 70B model on SageMaker Hyperpod using Trainium instances.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

KeitaW commented 3 months ago

Interim Solution for Llama3 Training

As discussed with @syedazi, we have decided to implement a temporary measure to address our current needs: