aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
420 stars 136 forks source link

Update finetuning_llama2_7b_ptl.rst #909

Open bevhanno opened 1 week ago

bevhanno commented 1 week ago

Update readme in fine-tuning llama2 tutorial for neuronx-distributed

Description:

The current tutorial suggest to ssh into a compute instance to run the checkpoint conversion. This requires to start a compute instance manually. The suggested edit uses an sbatch job that executes the conversion command on a node that will be started automatically for that purpose.

PR Checklist

Pytest Marker Checklist

(Coming soon...)

Reviewer Checklist

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.