Closed saforem2 closed 6 months ago
Adds support for cuda=12.2 on Polaris @ ALCF
cuda=12.2
Uses new conda/2024-04-29 base environment
conda/2024-04-29
$ module use /soft/modulefiles $ module load conda/2024-04-29 $ conda activate base
Add new ALCF/data-lists/{sunspot,polaris}/*.txt with Dolma v1.7 data set
ALCF/data-lists/{sunspot,polaris}/*.txt
Update to launching mechanism
mpiexec
deepspeed
LAUNCH_CMD=deepspeed bash train_llama_alcf.sh
LAUNCH_CMD=MPICH
Adds support for
cuda=12.2
on Polaris @ ALCFUses new
conda/2024-04-29
base environmentAdd new
ALCF/data-lists/{sunspot,polaris}/*.txt
with Dolma v1.7 data setUpdate to launching mechanism
mpiexec
to launch by defaultdeepspeed
by overriding theLAUNCH_CMD=deepspeed bash train_llama_alcf.sh
LAUNCH_CMD=MPICH
will be default, and we will launch withmpiexec
on {Polaris, Sunspot}