Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
Update readme in fine-tuning llama2 tutorial for neuronx-distributed
Description:
The current tutorial suggest to ssh into a compute instance to run the checkpoint conversion. This requires to start a compute instance manually. The suggested edit uses an sbatch job that executes the conversion command on a node that will be started automatically for that purpose.
PR Checklist
[X ] I've completely filled out the form above!
[ ] (If applicable) I've automated a test to safegaurd my changes from regression.
[ ] (If applicable) I've posted test collateral to prove my change was effective and not harmful.
[ ] (If applicable) I've added someone from QA to the list of reviewers. Do this if you didn't make an automated test or feel it's appropriate for another reason.
Update readme in fine-tuning llama2 tutorial for neuronx-distributed
Description:
The current tutorial suggest to ssh into a compute instance to run the checkpoint conversion. This requires to start a compute instance manually. The suggested edit uses an sbatch job that executes the conversion command on a node that will be started automatically for that purpose.
PR Checklist
Pytest Marker Checklist
(Coming soon...)
Reviewer Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.