Open KeitaW opened 4 months ago
Progress update: I have created draft README.md and tested following scripts working:
0.configure-env-vars.sh
1.build-image.sbatch
3.pretrain.sbatch
4.finetune.sbatch
README.md
in slurm
subdirectory is still WIP.Preparation steps order look confusing to me. Basically we assume that a user starts this tutorial in some arbitrary location in filesystem but it already has file 0.create-dot-env.sh
from this repo there and we ask to run it to create .env
in this arbitrary location and then run source .env
. Then we ask the user to go to come location defined by .env
and clone this repo there. In the following steps 1.build-image.sbatch
assumes that we need to have .env
in this newly cloned location. So, the user should have two .env
: one before git clone
, another after git clone
?
I suggest to change the Preparation steps this way:
cd <Some User defined FSX location>
export FSX_PATH=`pwd`
git clone https://github.com/aws-samples/awsome-distributed-training
cd awsome-distributed-training/3.test_cases/torchtitan-torchtune/slurm
./0.configure-env-vars.sh
Thanks @pbelevich for the suggestion. I agree. Updated README.md
to guide to clone the repository first.
Basic functionalities have been implemented. Allow me to iterate on the other PRs...
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.