allenai / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
4.4k stars 436 forks source link

slurm script for: configs/official/OLMo-7B.yaml #699

Open andymvp2018 opened 1 month ago

andymvp2018 commented 1 month ago

❓ The question

do you know the slurm script for configs/official/OLMo-7B.yaml? looking for multi-node slurm script

2015aroras commented 1 month ago

I'm not sure what exact script was used, but something like https://github.com/allenai/OLMo/blob/main/scripts/lumi/mitchish70.sh may be adaptable to your purposes. That script does not set an architecture-related settings.

andymvp2018 commented 1 month ago

Thanks @2015aroras , two questions:

  1. If I set micro_train_device batch size, this will over-ride the global batch size right?
  2. what are these?

B"$PROJECT_DIR:$PROJECT_DIR" \ -B"$FLASH_DIR:$FLASH_DIR" \ -B"$SCRATCH_DIR:$SCRATCH_DIR" \ -B /opt/cray:/opt/cray \ -B /usr/lib64/libcxi.so.1:/usr/lib64/libcxi.so.1 \ -B /usr/lib64/libjson-c.so.3:/usr/lib64/libjson-c.so.3 \ $PROJECT_DIR/containers/$OLMO_CONTAINER \

2015aroras commented 1 month ago
  1. The global batch size carries the size of the batch in the current step. We split a batch across our GPUs, so each device has a smaller 'device' batch size (global size / num devices). A GPU doesn't have enough memory to do the whole device batch in 1 forward + backward pass, so we split the device into multiple micro batches and do separate forward + backward passes. After all the micro batches are done, we do the optimizer step. Overall, micro batch size is just about avoiding memory issues & getting good perf; it should not affect training results. You'll want the micro batch size to be a divisor of the device batch size.

  2. Our slurm jobs run in singularity containers (maybe there are ways to use other types of containers in your system). The -B is mounting directories from outside the container into the container. $PROJECT_DIR/containers/$OLMO_CONTAINER is the location of the container